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ABSTRACT 


Students from regular classes and Junior Adaptation 
classes were rated by their teachers on the Walker Problem 
Behavior Identification Checklist (WPBIC). A total of 188 
subjects, divided evenly between the Bas groups, were used 
and the results were compared. 

On the full scale and on 4 of the 5 subscales 
(Acting-out, Distractability, Disturbed Peer Relations and 
Immaturity) the mean scores of the two qroups showed 
differences significant beyond the .001 level of confidence. 
The remaining scale (Withdrawal) showed differences between 
the groups, significant beyond the ‘Obulevel Of cOntidence 
but the results here were felt to be seriously compromised 
by a nested teacher-related effect on scores. A similar 
nested teacher related effect was found imerelataon co che 
full scale but was overwhelmed by real differences between 
the groups. 

Frequency ‘dal vreroi for the full scale and all 
subscales were generated and the amount of overlap around 
Walker's cut-off points for problem behavior was rated. On 
idea eoea Ley it (a roe) substantially more subjects 
in the Junior Adaptation classes were rated as possessing 
problem behavior than subjects in the regular class group. 


It was noted that the hypotheses in regard to these polygons 
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may have been too restrictive in their setting of limits. 

Groups matched for age and sex were selected from 
the two source groups and compared for differences of scores. 
On tal. bub oe II (Withdrawal) and Scale V (Immaturity) 
differences beyond the .001 level of confidence were noted. 
On Scale V (Immaturity) differences beyond the .05 level of 
confidence were noted while on Scale II (Withdrawal) no 
Significant differences were found. It was determined that 
these groups were Similar to the source groups, 

The Junior Adaptation sub-group used above was 
compared to a sub-group from Walker's original sample, this 
sub-group being identified as possessing disturbed behavior. 
The sub-group from the present study and Walker's subgroup 
appeared to be very similar in that there was no statistically 
Significant difference in their mean scores or variance. 

Small, negative correlations were found between 
subjects' ages and checklist scores and between length in 
Junior Adaptation classes and checklist Scores, These 
results were nue considered significant, however. 

The results of the findings were discussed and 
certain conclusions were drawn regarding the usefulness of 
the instrument as a device for screening students for 


evidence of problem behavior. 
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I INTRODUCTION 


Background and Problem 

When does a child's negative behavior cease to be 
merely annoying and begin to be problematic? Within this 
seemingly innocuous question are Tooceame rein definitional 
problems that affect the attempt to find an answer. | 

Of immediate concern is the word "negative." In 
Allee (1958) the form "negative" is defined as ". 
expressing denial, prohibition, or refusal; lacking positive 
Gualities; not postive (p. 250). However, behaviors that 
may annoy a teacher in one classroom eg. attention seeking, 
extreme perfectionism, even aggressiveness, may now be 
perceived as annoying by another teacher or the childs 
parents. It may ge ae at the time, the annoyed teacher 
is hyper-critical, impatient, Or simply busy and perceives 
the behavior as annoying. It may also be that the behavior 
imequestion is,sin fact, serious enough to warrant outside 
attention, but not recognized as such by the parents' 
untrained eyes or the teacher's differing standards. 

Contained within the question is also the matter of 
defining the word "problematic." What constitutes problem 
behavior? Is it merely the manifestation of overt "negative" 
behaviors? Certainly, the child whose behavior is overtly 


aggressive, or immature, or reflective of distractability, 
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may be considered to have a behavior problem. However, 
should we not, also, consider as problematic, the behavior 
of the child who has sat in the classroom from September to 
May without deeering a sound. Perhaps, because this child 
has never defied the teacher, or struck another student, the 
teacher assumes that there is no problem because the child 
makes no demands on the teacher's time. 

The point to be made here is that the evaluation 
and identification of problem behavior can be Ageeey sub- 
jective and difficult matter. The concerned and dedicated 
teacher may be payee dent POomiliabel “aechrid onecne basis OL 
aesubjective “evaluation, “for®fear“that the “label” “may 
Eolllowethe child seven if it no!’longer applies. It is not 
hard to visualize the potentially devastating long term 
effects that mis-identification or faulty "labelling" of a 
child could have. On the other hand, a child whose problem 
behavior goes unchecked because of a teacher's fear of mis- 
identification, will not only suffer personally but cause 
others to SuBESe as well. 

The need, then, exists for a relatively simple, 
comprehensive, valid and reasonably reliable instrument which 
Cangnelp ree determine if a particular child's pattern of 
behavior warrants action by outside personnel, or whether 
this behavior can be dealt with successfully by the regular 
class teacher. Also, if factors related to the particular 


behavior type can be isolated, then decisions can be made 
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3 
as to a strategy for remediation. Such an instrument would, 
of course, form only a part of the process for selecting 
children with serious behavioral problems who would require 
specialized i icteaceion and remediation. 

The Walker Problem Behavior Identification Checklist, 
developed in 1970, has been claimed to be useful in identify- 
ing children with behavioral problems. Based on a norming 
sample of 534 Grade 4, 5 and 6 students, the instrument is 
said to identify behavior problems along 5 dimensions 
(acting-out, withdrawal, distractability, disturbed peer 
relations, immaturity). Whether or not the instrument is 
generalizable to the North American population as a whole 
remains to be adequately proven. 

tijmin the presenbestudy, (Walker sechecklist awere fo) 
distinguish between children in regular classes and those 
in classes for children with behavior disorders, its 
usefulness as a device for detecting behavioral problems 
could be considered. As such, the instrument could then be 
used by Resamene to assist in making decisions regarding 
their students' behavior. 

If, on the other hand, the instrument were to fail 
to distinguish between children in the two groups, serious 
questions could be raised concerning the instrument's 
usefulness. In fact, the instrument should perhaps then be 
subject to a thorough re-examination in terms CLs 


standardization, validity, and design. 
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Alternative to weakness in the instrument, in the 
event of failure to discriminate, would be a consideration 
of weakness in the selection process used to place the 
behaviorally Paismasce children. On the surface, a weakness 
in this process would seem to be a relatively remote 
possibility, since these children have been subjected to 
extensive screening before placement, and each case has been 
carefully considered. This would not, however, rule out 
the possibility of a problem in this area. 

The process of successfully identifying children 
with behavior problems, and determining the type and severity 
of the problem, has been one of some concern. An instrument 
which may be of some use in this process, has been available 
for some time, but could benefit from investigation as to 
its exact usefulness and limitations. 

The main focus of this study will be to investigate 
the instrument's ability to distinguish between children 
who have been identified as possessing some behavioral 


disorders and those not so identified. 


Purpose of the Study 
The present study will attempt to determine the 


extent to which the Walker Problem Behavior Identification 
Checklist can differentiate between children in a Grade 4, 
5 or 6 setting who have not been identified, formally, as 


possessing behavior problems and children in Junior 
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5 
Adaptation classrooms, who are approximately the same age, 
but have been identified as possessing problem behavior. 
Since the checklist claims the ability to discriminate 
between eeonien and non-problem behavior, it should show 
differences of a significant nature between these two groups 
CL -eniidren. 

In addition to determining whether or not this 
instrument can detect overall differences between the groups, 
it is of interest to determine the percentage oF children 
in each group that, according to the Checklist, are mis- 
classified. That is, what percentage of children in the 
regular classes are identified by the criteria of the 
checklist as possessing problem behavior and, of possibly 
greater importance, what percentage of children in Adaptation 
classes are identified as not possessing problem behavior. 
Obviously, an inordinate percentage of misclassified children 
would raise guestions about the discriminatory properties of 
the checklist, or perhaps more seriously, the procedures 
used to select students for inclusion in Adaptation classes 
in the local system. 

Themabi lity son inability to datherenttate between 
the two groups should yield an estimate of construct 


Validity, inthe form Of contrasted groups validity, 
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II REVIEW OF LITERATURE 


A Model for the Development of Behavior 

Before an examination of problem behavior can be 
initiated, a model for behavior should first be established. 
For the purpose of this study, behavior will be considered 
to develop in terms of the Social Learning model outlined 
by Bandura and Walters (1963) and modified somewhat by 
Bandurass L973 )% 

The essence of Social Learning theory centres 
around the notion that behavior is a learned, rather than 
innate phenomenon. While not excluding the possibility of 
spontaneous behavior, based on available cues in the 
immediate environment, Bandura and Walters suggest that much, 
if not most, learning occurs from observing a model perform 
or hearing a model verbalize the performance of a behavior. 
Especially if the model is rewarded for the particular 
behavior, the learner tends to learn much more rapidly and 
easily than if no model was present. According to this 
view, the model's behavior is copied by a series of 
successive approximations until the behavior is mastered. 

Bandura (1973) qualifies this position somewhat to 
account for behavior not being learned even under all 


possible favorable conditions. He states: 
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. . . Exposure to models, even prestigious ones, 
does not automatically produce matching performances. 
In any given instance absence of imitative behavior 
may result from faulty observation, retention losses 
due to inadequate symbolic representation and 
rehearsal, motor deficiencies, or simply unwillingness 
to perform the exemplified behavior because of its 
unfavorable consequences (p. 72). 


This somewhat enlarged view accounts more adequately 
for the absence of learning of modelled behavior than did 
the earlier position. It grants more scope for individual 
differences in the learning process and attempts to specify 
some possible reasons behind the absence of hearning Nn 
some cases. 
Bandura (1973) goes on to point out certain effects 
that modelled behavior can have on behavior in the observer: 
First .. . observers can acquire new patterns of 
behavior through observation. A second major function 
of modelling influences is to strengthen or weaken 
inhibitions of behavior that observers have previously 
learned. Inhibitory and disinhibitory effects are 
largely determined by observation of rewarding and 
punishing consequences accompanying model's responses. 
The actions of others also serve as social prompts 
that facilitate similar behavior in observers. Response 
facilitation effects can be distinguished from observa- 
tional learning and disinhibition by the fact that no 
new responses are acquired, and the appearance of 
analogous actions is not attributable to weakening of 
inhibitions because the behavior is socially acceptable 
and, hence, unencumbered by restraints (pp. 68-69). 
The model» can, thus, show new behavior, strengthen or 
weaken inhibitions toward behavior by the observer, or 
present social cues or situations where a particular behavior 


is acceptable. ©The role of the»model in Social Learning 


theory can be considered to be crucial to the development 
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of behavior. 

Once the behavior has been learned or modelled its 
survival depends upon the effect of reinforcement or reward. 
A reinforced-rewarded-behavior can be expected to recur if 
a situation like the one in which the behavior was learned 
recurs. A behavior that is reinforced on each occurence or 
emission can be learned very quickly, but can also be 
extinguished (unlearned, for want of a better word) if 
the reinforcer is no longer presented. Behavior whose 
reinforcement occurs irregularly in terms of time between 
presentation of the reinforcer, or in terms of the number of 
emissions of the behavior, or a combination of the two above, 
will be learned more slowly that behavior reinforced each 
time it occurs. However, behavior learned in this way also 
resists extinction to a high degree and can be expected to 
persevere over a considerable length of time, even in the 
absence of any reinforcement. 

This basic position has one major qualification 
expressed by Bandura (1973): 

Response consequences (i.e. reinforcement 

contingencies) .. . have weak effects on behavior 

when the relationship between one's actions and 

outcomes is not recognized. On the other hand, 
awareness of conditions of reinforcement typically 
results in rapid changes in behavior, which is 
indicative of insightful functioning. People who are 
aware of what is wanted and value the contingent rewards, 
change their behavior in the reinforced direction; those 
who are equally aware of the reinforcement contingencies 
but who devalue the required behavior or reinforcers 
show little change; those who remain unaware achieve, at 


best, small increment in performances even though the 
appropriate responses are reinforced whenever they occur 
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Two salient points are raised here that distinguish 
Social Learning theory. from the strict behaviorist, view one 
might expect from Skinner (see Baldwin, 1967). First, as 
compared to See behavioral view, cognition is recognized as 
a key-aspect of the learning process. That is, the learner 
must be aware of the reward contingencies affecting a learning 
situation if he is to make significant progress. Second, 
valuation becomes a factor. Even if the learner is aware 
of the reward contingencies, he must value the ae before 
he will make gains. This valuing of the reward may be termed 
motivation, for want of a better word, and differs from the 
behavioral concept of motivation which sees it as a state of 
need or deprivation. The Social Learning view adds a level 
of sophistication to the concept, again through its cognitive 
dimension, that seems to be lacking in the other view. 

Social Learning theory views the acquisition and 
emission of learned behavior in terms of generalization and 
discrimination to relate how one behavior may or may not 
be used in a variety of differing though similar situations. 
To generalize a behavior one emits a particular behavior 
in a situation whose cues are very similar to that in which 
the behavior was first learned. As these cues or signals 
for behavior are more dissimilar, the particular behavior 
is less likely to be emitted. If this process breaks down, 
an individual may overgeneralize and emit a behavior totally 


inappropriate to the situation. By the same token, and 
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concurrent to a generalization process, a very strict 
process of discrimination must occur so that individual can 
determine whether or not a particular behavior is appropriate 
to inappropriate to a given situation. If this process does 
not function properly or if the cues or modelled behavior 


are inappropriate or misunderstood, inappropriate behavior 


Were brke yasesult. win-a Very real=sense, generalization 
and discrimination are two sides of the same coin. A 
breakdown in either process will affect the other. In fact, 


Bandura’ (1973) tends to™refer to~the two as parts of the 
same whole. 

Various Other effects operate to influence the 
learning of the individual. An individual who is highly 
dependent, that is, one who constantly looks to outside 
sources for reinforcement will be much more susceptible to 
social influence than a person who is highly independent in 
his actions. It should be noted here that both dependency 
and independence are, in themselves, learned behaviors and 
as such can be altered through behavior modification as can 
most behavior according to Bandura and Walters. 

Sex differences form an integral part of the social 
learning process. Social demands have traditionally been 
different for girls than for boys and the extent to which 
these demands are learned will affect the individual's 
response to a given set of situational cues. 


The sex of the individual who is the model for 
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17 
behavior can also be of considerable importance. The 
modelling effect, i.e. extent of imitation, will be greater 
if the model is the same sex as the learner. 

The ne aeananee of learned behavior is also more 
likely if the model is a “high prestige" individual. If 
the model is held in high regard by the learner, then his 
behavior is more likely to be imitated. Again the concepts 
of cognition and valuation are considered here as being 
quite important. | 

If there is a state of emotional arousal, i.e. some 
LOrmMmeofeexeitatdony7sanxiety;oortinterest, Leer Sere to 
be facilitated over a state of non-arousal or neutral affect 
to the :situation. Haweiert this aroused state is only a 
facititatorsore learning! withinhncertaini\damitsed) If, an 
optimal level of arousal is passed, then the aroused state 
can, conceivably, block learning because of the individual's 
concentration on the source of arousal and its elimination, 
rather than the learning situation. 

All of ne above mentioned factors are, according to 
Bandura and Walters, based on some prior experience or 
previous learning in a similar setting. This may also be 
said of conflict, i.e. approach avoidance situations where 
two mutually exclusive options are present and of displace- 
ment, i.e. the transfer of a desired response from one object 
to another. 


Social Learning theory has undergone some evolutionary 
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12 
change in regard to views on punishment and non-reward. 
Whereas Bandura and Walters (1963) saw punishment as 
inhibiting behavior without removing it from the behavioral 
repertoire and dchiawa ha as extinguishing behavior, Bandura 
(1973) presents a somewhat more thoroughly considered view: 


There are two principal ways in which negative 
sanctions inhibit forbidden actions. Repeated 
punishment for aggressing toward certain persons 
places or things endows them with fear arousing 
value. As a result, inclinations to aggress toward 
these threats evokes fear, which motivates inhibitory 
controls. 

The effectiveness of punishment in controlling 
behavior is determined by a number of factors. Of 
special importance is the level of reward achieved 
through (aggressive) conduct and the availability 
of alternative means of securing the desired goals. 
The likelihood that aggression will be punished, the 
nature, severity and duration of the aversive 
consequences, and the time elapsing between aggressive 
actions and negative outcomes also determine the 
suppressive power of punishment. Additionally, the 
level of instigation to aggression and the character- 
istics of the prohibitive agents influence how 
aggressors will respond to being punished (pp. D2 tee) 2. 


While the above statements specify aggressive behavior, 
various forms of undesirable behavior could be substituted 
where the words “aggression” or "aggressive" appear and still 
apply. 

The appeal of Social Learning theory lies in the 
fact that its emphasis on imitative learning of a model's 
behavior and the importance of schedules of reinforcement 
(the manner and timing of reinforcers) applies equally well 
to both normal and problem behavior. 


It is within the Social Learning framework that 
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Walker's Problem Behavior Identification Checklist appears 
to have its foundation. The model for behavior acquisition 
and elimination should give some indication of the develop- 
mental nature of behavior-acquisitiontand-change. As noted 
above the model is derived primarily from Bandura and 
Walterpsyu (b963,% ppmalc32)\pnexcept iwhere otherwise indicated, 
and has been incorporated primarily to indicate the author's 
general position on behavior theory. 

phis stuayfrsinot ovyenly concerned with the acquasi—- 
tion of behavior, but rather with determining whether or 
not certain operational descriptions of behavior are useful 


in describing problem behavior. 


Review of Relevant Literature 

A number of sources in the current literature on 
teacher rating of student behavior and attributes will be 
examined briefly. An attempt has been made to utilize 
recent articles as much as possible. Since the matter of 
teacher rating of pupil behavior is central to this study, 
this will form the major focus of this review. 

In an apparent attempt to streamline the process OL 
identifying emotionally disturbed children at the elementary 
level, Maes (1966) undertook a study which showed that 
emotionally disturbed children could be identified as 
effectively through the use of a teacher rating scale and a 


group I.Q. test as by a battery of measures including 
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14 
mathematics achievement, reading achievement, a modified 
sociometric technique (namely class play) and a self concept 
inventory. The evidence, here, suggests that if a teacher 
has access to an objective rating instrument, and is trained 
in the use of it, his classroom observations can be effec- 
tively used as a major means of identifying behavioral 
attributes. The results of this study, though encouraging 
and supportive of Maes' hypothesis, do not appear to have 
been validated by further research either by Maes' or any 
other researcher. Without the support of successful 
replication, Maes' research, though interesting; 1s OL 
limited usefulness. 

Ebbeson (1968) studied kindergarten teachers' 
rankings of their students' later academic achievement. It 
was found that the teachers predicted quite accurately the 
academic achievement of these students in the early grades 
of school. The same result was claimed in the prediction of 
future achievement with two successive kindergarten classes. 
The successive repetition of Ebbeson's initial results 
lends some support to his conclusion about the effectiveness 
of teacher prediction in this case. However, kindergarten 
teachers do not live in a vacuum and though it was not 
discussed by Ebbeson, it is entirely possible that these 
teachers could have passed their views on students to 
successive teachers, either orally or by written comment, 


thus biasing the expectations for academic achievement on 
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15 
the part .of those later teachers... This view.41s.,entirely 
speculative, but, given the nature of teacher to teacher 
communication and the nature of cumulative student records, 
iSone that should be considered. 

In an observational study of 10 normal children and 
a larger group of children with behavior disorders, Werry 
and Quay (1969) presented evidence to suggest that a method 
of direct behavioral observation in the classroom is reliable, 
that it discriminates between normal and disturbed behavior 
and gives information on the nature of the maladjustment. 
This work, based on largely individual items of observed 
cherie suggests that real differences are observable 
between normal] and behaviorally disturbed children. An 
obvious problem here is the relatively small size of the 
group identified as normal. A group of 10 subjects can do 
little more than to suggest general trends and can hardly 
be used effectively as a standard for normalcy. 

Bryan and Wheeler (1972) found that systematic 
observation Bee aay ond disabled children revealed that 
these children spent significantly less time in task 
oriented behavior than did non learning disabled children. 
They stressed, however, the importance of knowing what to 
look for. Even though a child was looking at a book, 
something most teachers would consider on-task behavior, he 
might well not have been reading it. The looking without 


reading would be off-task behavior and Bryan and Wheeler were 
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16 
careful to point out that careful and directed observation 
techniques were needed to successfully perceive behaviors 
accurately. This seems to be a crucial point and may provide 
a key to the Pee nes inconsistent research results obtained 
in studying observations by one individual on others. Not 
only must the observer be sensitive to the group or indi- 
vidual under observation, but, to be effective, he must also 
knowewhatene, 1S )looking ateandrtor. ~Insthissarea,:a 
structured observational guide would have its ear value. 

In a separate study with McGrady, Bryan (1972) 
analyzed teacher ratings of 183 boys labelled as having 
learning problems and 176 normal learners. The analysis 
indicated that teachers consistently rated problem learners 
lower on each area of the scale used than they did normal 
learners. Validity was established by comparing the groups 
identified by the Pupil Behavior Rating Scale with reading 
and WISC vocabulary scores. On each measure, the learning 
disabled group scored significantly lower than the normal 
children. The conclusion formed from this study was that a 
teacher checklist, in this case the PBRS, could provide an 
efficient and economical measure for use in screening for 
learning disability. “The authors did, as a cautionary note, 
suggest further study of the validity of the PBRS and of 
the basis upon which teachers make their discriminations. 
This suggestion for further research does not necessarily 


detract from the value of what appears as a well planned and 
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157 
executed study which concludes in favor of the utility of a 
teacher rating scale. 

Bullock and Brown (1972) compared reacher reported 
behavior disorders of 1189 special education students to 
results on the Behavior Dimensions Rating Scale. A high 
correlation was found between factors on the BDRS and 
problems stated as serious by the 112 teachers involved an 
the research. The findings were used to conclude that 
teachers appear to have the ability to observe and judge 
student behavior patterns effectively. The sample size 
used here seems to lend an air of authority to che study, 
although the matter of exacting validation of the BDRS, or 
the lack of it, remains a problem. This problem pervades 
much of the research involving the use of checklists. 

Cowgill, Friedland and Shapiro (1973), aya study 
using 37 kindergarten boys ate had been identified as being 
learning disabled by the Massachusetts State Department of 
Education and 37 "normal" kindergarten boys, found that 
their teachers' evaluations differed significantly on all 
but one of 7 trait categories and on all general behavior 
categories used in the study. The results were taken as 
evidence for the value of teacher reports in identifying 
learning disabled children. No mention was made as to 
whether or not the teachers were aware of how each child 
was "labelled." If they were unaware of a child's 


classification this research could be considered as useful 
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18 
support for teacher awareness of pupil attributes. If 
however they were aware of the children's classification, 
this knowledge could very easily have biased the observat- 
trons made. “-The@bias, if °present’,, could tend to lead the 
teacher to perceive behaviors in such a way as to support 
the classification. This doubt somewhat compromises the 
Significance of this study. 

Garner and Bing (1973), in a study examining 
differences in pupil-teacher contacts, ghvaneee to correlate 
verbal teacher-pupil exchanges and teacher ratings of 
pupils. Students between 7 and 8 years old, from 7 classes, 
were used. The finding of interest to this study was the 
high degree of agreement in teacher' ratings of specific 
pupil attributes, regardless of the amount of contact. fo 
is left to speculation as to whether this agreement reflects 
similarities in overall attributes of students, or simply 
similarities in the penreeptions of a~qroup Of teachers. 

Hammet and Batchelor (1973) described the advantages 
of a behavioral rating questionnaire which parents and 
teachers could complete to provide more precise and compre- 
hensive data than that obtainable by routine clinical 
observation. Again the point was made (as by Bryan and 
Wheeler (1972) and Bryan and McGrady (1972)) that the 
questionnaire provided direction and structure on which to 
base observations. In this way, they claimed, the precision 


available through intensive interaction with the subject 
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19 
could be maximized and subjective judgments minimized. This 
interaction effect is certainly valuable in the same way 
that the structure of the questionnaire is valuable, if that 
questionnaire Hes been adequately validated. In this case, 
the validation of the questionnaire does not seem to have 
been adequately handled and this fact tends to minimize the 
value of the findings. 

Hartlage and Lucas (1973) used 1132 children as 
subjects in the validation of an approach to group screening 
for reading disabilities in the first grade. A correlation 
coefficient of .83 was achieved between teacher rankings of 
the children and the reading levels achieved by the students 
on the Wide Range Achievement Test. For comparison with 
the WRAT, 2 teachers' rankings were used. The result here, 
reflecting good levels of accuracy, was used to suggest that 
the trained observer, familiar with his subjects and the 
concepts under observation, can be considered likely to be 
quite accurate in his observations. The apparent thorough- 
ness of this Seay gives its conclusions a good deal of 
merit. 

Using a scale developed to measure 11 behavioral 
attributes, Lambert and Hartsough (1973) correlated multiple- 
teacher judgments of pupil characteristics. It was found 
that these multiple judgments correlated between .70 (often) 
sick or upset under stress) to 1.0 (fighting and quarreling). 


On the basis of these high correlations between teacher 
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20 
judgments, Lambert and Hartsough suggested that teachers 
are quite able to perceive and isolate the behavioral 
attributes of their students. The multiple rater technique, 
if and when practical, appears to be a useful method to 
determine a measure of reliability of an instrument, although 
the matter of establishing validity may still be elusive. 
This study appears to have picked useful attributes for 
study without being overly restrictive or unduly open-ended. 

In an attempt to predict potential learning problems 
in low Socio-Economic Status rural children, Lessler and 
Bridges: 1973) found aycorrelation of ,.75,(p.<..001) between 
eeu its on the California Achievement Test and teacher 
ratings of pupil performance. This fairly high level of 
predictability was not found on other measures used. It 
appeared that the Metropolitan Readiness Test was the best 
predictor of potential learning disabilities. This research, 
though lending no great strength to the argument for teacher 
rating of pupils' performance, at least suggests that, le 
some areas, Beeha re can predict future performance based on 
their observations. 

Maguire's (1973) work showed no significant 
differences between child care workers' ratings on an abridged 
Devereux Adolescent Behavior Scale and self ratings made by 
female adolescents with behavior disorders, on themselves, 
using the Teen-Agers Self Awareness Test. This work, though 


not conducted in an educational setting as such, suggested 
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21 
that an observant individual, using a reasonably objective 
instrument could be expected to accurately detect and 
evaluate the behavior patterns of other individuals. 

In a study of slum pre-schoolers, Richards (1973) 
found a moderate but significant product-moment correlation 
between teacher ratings and the Peabody Picture Vocabulary 
Test I1.Q's. It is possible that a higher coefficient of 
correlation may have been achieved had the teachers had 
more experience with the children tested in the study. 
Nonetheless, it is of interest that some correlation exists, 
even at this early level, between a teacher's view of a 
child and an objective instrument's evaluation of his 
intelligence. 

Richmond and Dalton (1973), in a study of 9-15 year 
old retarded students using the Coopersmith Self-Esteem 
Inventory, found that the child's self image, as reported 
on the Inventory was positively related to teacher evalua- 
tions of academic ability while teachers' ratings of social 
and emotional behavior could not be shown to correlate signif- 
icantly. While the inability of the teachers' rating of 
social and emotional behavior to correlate with the students' 
rating of selves is discouraging, this study does have value 
in showing that teachers do appear sensitive to their 
students' behaviors in some ways. It would appear that a 
student's self image and the behavior affected by it, 


could relate to his academic performance to some extent. 
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22 
Tas Study also’ forces. recognition of the fact that teachers, 
as observers, are far from infallible and are very likely 
most sensitive to achievement related behaviors, 

Tipes wag hens the accuracy of teacher predictions on 
learning performance, Wang (1973) had 2 teachers estimate the 
Primary Education Project results of their classes which were 
made up of 12, 4 year olds and 13 kindergartners. The first 
teacher had a mean accuracy of 67.7% (50.0% to 88.9%), while 
the second had a Mean accuracy of 76.2% (65.7% fp Bo.os) % 
These results, though perhaps not as accurate as anticipated 
by Wang, were Significantly different from chance values. 

The implication here is that teachers’ predictions, though 
not 100% accurate or totally consistent, do reflect a certain 
level of awareness of their students' characteristics and 
capabilities. 

The available evidence points toward a structured 
rating of behavior as being a potentially reliable and valid 
technique of behavioral observation. Structured rating 
allows for Bee ecanay and frees the observer, or should 
free the observer, from making value judgments. Although 
free observation, by teachers and others, has moderate 
apparent success the more structured form of rating appears 
to be more effective. 

Great claims have been made for some of the methods 
described above. However, most of the research above has 


not been replicated in any way, so the Valiogey OL sue 
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23 
resultsuis inequestion.,.Also,pmany of the rating? scales 
devised by authors have not been thoroughly validated, thus 
casting doubts upon exactly what they measure. Despite these 
limitations and those discussed above, there does seem to be 
a place for teacher observation of student characteristics 
and attributes. It is in the realm of revalidation and 
replication that further research is andveated in mich of the 


work discussed above. 


Ratings and the Criteria of Ratings 

Swift and Spivak (1969) acquired 298 satingssofsfiftth 
grade achievers and underachievers. The achievement criteria 
used were subtest scores on a group test and teacher assigned 
report card marks. An analysis of the relationship between 
classroom behavior and the achievement criteria indicated 
that when a child was underachieving the fact was evident in 
both grade or test scores and general funetroning yin sthe 
classroom. Underachievers, it was shown, were clearly 
different from achievers in manifestations of overt maladap- 
tive behaviors. The authors pointed out that the findings 
were particularly true when the achievement criteria used 
was the teacher's judgment of the quality of the child's 
efforts. This would suggest that there may be a relationship 
between the criteria of achievement and the rating of 
behavior. The objective criteria used showed similarity to 


the teacher's subjective grading, but it is left to 
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24 
conjecture whether the rating based on subjectively graded 
achievement reflects behavior or whether the behavior 
resulted from expectation, 

The eetiso miu to be made here is that results seem 
to vary based on criteria and that subjective ord ter va’ ieeg.. 
grades, may be biased or biasing. The more objective the 


criteria, the more useful the results. 


Factors Relating to Raters 

In addition to concerns about the efficacy of 
behavior rating, rating scales and criteria, questions arise 
about factors affecting the rater. Such questions as the 
rater's attitudes towards the subject being rated and sex 
differences among raters deserve some consideration and will 
bevrdealt with) briefly. 

Both Grgin (1969) and Walker (107.0) een ea S ei dst All, 
work on the WPBIC, suggest that no significant sex differences 
appear, on the part of teachers, in the rating Of youp uw 
knowledge or behavior. Both Briones indicate that in using 
rating categories and exercising rigor in their ratings, male 
and female teachers show no real differences. The possible 
bias, either for or against same sex subjects, does not prove 
to be a relevant factor in rating by teachers. 

Also, representative of work examining the relation- 
ship between teachers' attitudes and their ratings of pupil 


behavior is Willborn's (1972) study comparing teachers’ 
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oe 
scores on the Minnesota Teacher Attitude inventory to their 
rating of pupils on the Behavior Maturity Scale. From this 
work, it appears that a teacher's attitudes have little or 
no bearing on oe ability to rate students objectively. 

To summarize briefly, there is a pattern of evidence, 
though perhaps not conclusive, suggesting that teachers are 
capable of observing and recording the behavior of their 
students, regardless of their own sex or attitudes toward 
the students. These observations may be more neerel nae 


structured, objective instrument is used. 
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III RATIONALE 


Based on the discussion presented in the review of 
birtcratire telating to ratings and rating scales, there 
appears to be considerable scope for follow up research on 
these types of instruments, since many scales used, including 
the Walker Problem Behavior Identification Checklist (WPBIC) 
have not been subjected to any form of subsequent study. 

Given Walker's initial research, which appears 
thorough, it was decided to study the checklist in terms of 
one form of validity. The re-estimation of contrasted groups 
validity was thus chosen as the focus of study. The method 
used was based on Winer (1962, pp. 89-92). The reason for 
this form of study was the existence of two groups of 
potential subjects with relatively well known characteristics. 
A group of subjects identified as behaviorally disturbed was 
available and in the same schools other children were 
available who were not so identified. It was felt that nee 
the instrument could distinguish between these two groups, 
the desired estimate of validity would have been achieved. 

With the above in mind, the subjects were selected 
and rated. This rating yielded full scale scores and scores 
on 5 factors used on the scale. These scores were then 
compared and distributed to determine where differences 


occurred, the direction of differences and any potential 
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weaknesses in the discriminatory powers of the instrument. 

To rule out sex differences in the scores, two 
subgroups were formed which were matched for age and sex 
and compared. These two groups were drawn from the original 
groups used in the study. 

To summarize, additional eprsenecih on the WPBIC 
appeared warranted and the estimation of contrasted Sree 
validity appeared to be the most fruitful area of study in 
terms of the instrument's probable future use as an aid in 
identifying behaviorally disturbed children. Various 
approaches to this estimation were decided upon to obtain a 
relatively clear picture of the instrument's discriminatory 


properties. 


Definition of Terms 

1. Problem Behavior. Problem behavior will be opera- 
tionally defined in terms of the specific behavior 
listed in the Walker Problem Behavior Identification 
Checklist (WPBIC) which comprise a set of 50 behaviors 
considered by raters and judges to be problematic, the 
list being drawn from observational reports by classroom 
teachers. Such behaviors would be broadly classified as 
acting out, withdrawal, distractability, disturbed peer 


relations, or immaturity. These broad categories are 
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28 
purported to cover the 50 behaviors included in the 
Checklist. 

2. Junior Adaptation Class, A Junior Adaptation class 
is one including children who are of normal intelligence, at 
least two years behind their peers in academic progress, and 
displaying behavior problems. These children are selected 
on the basis of extensive psychological evaluation, 
intelligence assessment and reports based on teacher observa- 
tion over an extended period of time. 

Beene cing -Out. erat ear out will be considered to be any 
behavior which indicates defiance to, or outright refusal 
to, comply with teacher instructions. If a child overtly 
refuses, by statement or action, or both to carry out the 
teacher's instructions within a certain specified period of 
time, he would be considered to be defying the teacher, 

This type of behavior could also include argumentativeness, 
extreme affect in the face of frustration, overly aggressive 
acts, temper tantrums, distortion of the truth and undue 
approval seeking for tasks completed. 

a, Withdrawal. Withdrawal will be defined as the 
absence of engaging in, initiating or responding to inter- 
actions with other children, whether of the same or opposite 
sex. The withdrawn student will also be considered the one 
who seeks not to draw any attention, either by the teacher or 


other students, to himself. 
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Be Distractabiuity, sDistractability will be defined as 
thewinability to attend to task; *The child whois” considered 
distractable, will. be the child who can be distracted from 
task by small movements and noises, who seems to be staring 
into space for long intervals, who underachieves and does not 
complete tasks consistently, or is overly meticulous, who 
tends to regularly disturb other children engaged in on 
task performance, or who seems unable to stay on task or 
within limits unless external control is applied. 

6. Disturbed Peer Relations. The child whose peer 
relations are disturbed will be defined as a child whose 
relations are entirely with same sex children, who stammers 
or stutters and appears unable to communicate effectively 
with peers, who comments that no one likes him, yet will not 
allow well done work to be displayed, or who often mutters 
unintelligibly to himself, rather than communicate with 
others. 

The Immaturity. The immature child will display certain 
age inappropriate behaviors such as enuresis, nervous tics, 
excessive nail biting, psychosomatic reactions to stress, 
listlessness, or tiredness. The immature child may also be 
shunned or avoided by others because of his age inappropriate 
behavior and may chose younger children as his playmates 
since their interests and activities more closely approximate 


his own than do those of his peers. 
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Beeee Onswask! and: "Off-Task's Performance. These terms 
will be operationally defined as performance, either 
appropriate to the immediate learning situation as structured 
by the teacher (on-task) or inappropriate to that situation 
(off-task). In a broader sense, these terms can also be 
applied to situation appropriate or situation inappropriate 
behavior in a social context. 

9. Misclassification.. For the purpose of this study, 
misclassification Tai be taken to mean assignment of a 
score. above, the criticalsor-ocuttoffascore oneavfactor or 
full scale to a regular class student, or below the Gritical 
or cut-off score for an Adaptation class student. This term 
does not imply that an error has necessarily been made in 


the identification of any particular subject. 


Hypotheses 

1. The mean overall checklist scores and variances will 
show significant differences between the regular class 
students and those in the Junior Adaptation classes. 
Differences will be sought Revo thew eleven On COnlidence. 
The two groups of subjects will be shown to be heterogeneous 
in terms of both mean scores and variances. 

2. The mean checklist score on each factor will show 
significant difference between the regular class and those 


in the Junior Adaptation classes. Differences will be sought 
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beyond the .01 level of confidence, 

3. The two groups will not overlap in scores achieved 
on the entire checklist beyond 15% of each ‘group. — Since 
Walker (1970) indicated that 10 to 20% of schoo) ichildren 
have behavior disorders, a percentage between these values 
was chosen as the acceptable level of overlap around the 
cut-off point between problem and non-problem behavior. 12 
greater than 15% Of ethe vegular class subjects are over the 
cut-off point, and greater than 15% of the Adaptation class 
group are below the cut-off point, certain questions 
regarding the instrument's usefulness could be raised. 

4. The two groups will not overlap in scores achieved 
on each factor behond 15% of each group. 

5. Groups, matched for age and sex, will show signif- 
icantly different overall checklist mean scores. Differences 
beyond the ~01 level of confidence will be sought. 

6. Groups matched for age and sex will show significantly 


different individual factor mean scores. 


Design 

The design of the study became fairly complex due 
to the nature of the selection of subjects for the study. 
The subjects were drawn from 5 schools in the local area. 
In each school two classes had members drawn from them fOr 
inclusion in Group I and two classes or major parts of 


classes were used to form Group TI. An illustration of the 
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32 
design of the groups for one school is presented in Figure 
ie 

Because of the design, a number of possible sources 
of variance Leer which required separate analysis. This 
analysis was performed "post hoc" and will be discussed 
later under Analysis Techniques. 

It should be understood that, although this analysis 
was not performed as part of the original hypotheses, its 
use was necessary to determine the extent to wich factors 


other than group differences afffected the checklist scores. 
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IV METHODS 


Subjects 

The subjects, who ranged in age from 8 to 13 years, 
with a mean age of 10 years, 1 month, comprised a total of 
188 students (126 male, 62 female) in two groups. Group I 
was made up of 94 students (47 male, 47 female) in 10 classes 
ino encase Group II was made up of 94 students (79 male, 
15 female) in 10 Junior Adaptation classes in the same 5 
schools. The matching of the number of subjects from each 
school was done in an attempt to minimize the differences, 
between the groups, that might be attributable to differing 
school environments. It was hoped that this equality Or 
numbers from each school, across the groups, would hold the 
school environment factor fairly constant. 

Since the Adaptation subjects (Group II) consisted 
of camp classes, and due to the nature of obtaining 
permission to use classes for study purposes, true randomness 
of sampling was not possible for this group.’ This fact is 
noted as a limitation of the study. The regular class 
(Group I) subjects were drawn randomly from their classes. 

Students from Junior Adaptation classes were selected 
to form one group of subjects because they satisfied at least 
one criterion for behaviorally disturbed children as outlined 


by Walker (1970). The class in which they have been placed 
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35 
represents a special educational provision related to their 
problem behavior. 

No socio-economic data, as such, were gathered on 
these subjects and only a cursory examination was made of 
their records. This examination suggested that the subjects, 
generally, were well distributed through the society in 
terms of socio-economic status. Further examination may, one 
may not, have shown a socio-economic status factor to be 
present as an Eee eee Chests variable, although this appeared 
unlikely. Some mention has been made of the remediation of 
disadvantaged, low S.E.S. children (e.g. Sibley, Abbott and 
Cooper (1969)) and that attitudes toward negative behaviors 
Wacy Witheoen.Se (e.g. Plltavin and Briar (1964)), but other 
research has tended to support the concept that negative 
behaviors are not confined to status limits. A case in point 
was the aggression research in which aggressive behaviors 
were easily acquired by university undergraduates (see 
Berkowitz (1966)). Furthermore, Eron, Huesmann, Lefkowitz 
and Walder (1972) found that socio-economic status was less 
of a factor in one form of behavior problem (aggression) 


than were television viewing habits. 


The Instrument 
This study will examine the Walker Problem Behavior 
Identification Checklist (1970). This checklist was 


standardized using 534 Grades 4, 5 and 6 students in 21 
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36 
classes in the Northwestern United States. 

The checklist was made up of 50 operational state- 
ments, which were selected as the most frequently made of 
300 Bee enents submitted by teachers about problem behaviors. 
The statements were then weighted for severity by a group of 
ANGIGRE with weightings ranging from 1 for the least severe 
to 4 for those items considered to reflect the most serious 
problem behaviors.» 

Reliability was estimated using the Kuder-Richardson 
split-half method which yielded a split-half correlation of 
598. ‘This correlation, according to Lindquist (1950), makes 
individual separations among subjects possible. 

Four estimations of validity were obtained by Walker, 
including contrasted groups, criterion validity, factorial 
and item ne Wg 

To establish contrasted groups validity, 38 children, 
meeting one of three criteria (a) psychological, psychiatric 
or clinical examinations; (b) specific educational provisions; 
(c) home instruction due to inability to benefit from school 
instruction—were matched with 38 students, not so identified, 
in terms of age, sex, and grade. Differences, significant 
beyond the .001 level of confidence were found between the 
means of the two groups. Walker thus claimed contrasted 
groups validity for the checklist. 


Criterion validity was estimated using a biserial 
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37 
correlation to measure the relationship between checklist 
scores and the construct of problem behavior as measured by 
the @hreercrateriam listed vabovey “eFrom this* biserial 
eorrelation= of S68>4{standard@error- 3039 and* index of 
predictive efficiency of .33) Walker claimed that the 
instrument was useful in predicting behavioral disturbance 
at the elementary level, 

A complicated procedure involving factor analysis 
was claimed to yield five, relatively independent, factors: 
namely Acting-out, Withdrawal, Distractability, Disturbed 
Peer Relations and Immaturity. Only Acting-out and 
Distractability overlapped significantly, thus intimating 
some common variance here. 

Item validity indices were obtained for all checklist 
teens ewhichevarted from .03 to .67: According to Walker, 
these indicated a high correlation with the total score and 
that the items discriminated between the upper and lower 27 
percent of the sample in terms of scores, He also claimed 
that all but three items, numbers 33, 36 and 47, appeared 
to indicate a relatively homogeneous set of behavior. 

Walker also found that boys scored significantly 
higher than girls on pd checklist. This result may appear . 
surprising, but Bandura (1973) points out, in specific 


reference to aggressive behavior that: 
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- . - boys, who are generally encouraged to 

emulate feats of physical powers, spontaneously 
performed all they had learned when they saw 
aggression well received ... . By contrast, 
girls, for whom physical aggression is tradition- 
ally regarded as sex inappropriate and, hence, 
negatively sanctioned, kept much of what they 

had learned to themselves .. . one should be 
more concerned with predisposing conditions than 
with predisposed individuals (p. 67). 

The broad pattern of traditional socialization of 
children places physically aggressive and active roles within 
the expected roles of boys, not girls (see Lefkowitz, Walder, 
Eron and Huesmann (1973). It seems safe to assume, then, 
that boys will show more types of overt behaviors and 
problems, especially related to aggressiveness, than will 
girls, The differing social expectations could partly 
account for the greater number of general, overt, behavior 
problems among boys, It was also found, by Walker, that no 
significant differences appeared as the result of sex 
differences on the part of the rater, 

Thus far, Walker's claims for the checklist have been 
the only views presented. In a critical review of behavior 
rating scales, Spivak and Swift (1973) present the following 
conclusions regarding the WPBIC: 

As an initial screening device, the WPBIC appears 

easy to use, probably taking no more than 5 minutes. . . 
Data suggest that the scale is reasonably homogeneous. 
Validity data, however, remain limited: the WPBIC total 
score correlated with the single clinical criterion, 

but this was probably due to the fact that the behaviors 


employed in defining the criterion measure were the same 
as those comprising the WPBIC. 
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Unfortunately, data regarding validity and 

reliability of factor scores are not available, 
and, in at least two instances misleading labels 
are assigned to factors ... it is impossible 

to determine how the weighting process built 

into the scoring system may affect the levels of 
validity when scores are tested against a variety 
Crererveeria. (p04. 

The criticisms expressed by Spivak and Swift (1973), 
above, are well founded. This study will not, necessarily, 
correct the problems, or answer the questions raised by 
these authors. It will attempt to determine if. the instru- 
ment can discriminate effectively and meaningfully, between 


two supposedly different groups of children. Questions 


beyond that framework will be left for future research, 


Procedures 

According to the guidelines established by Walker 
(1970), a minimum two month observation period was set as a 
requirement for the teachers who rated the students on the 
checklist in order to ensure rater familiarity with the 
subjects. Raters were instructed to include only behaviors 
manifested during the two month observation period. All 
teachers who were involved in the study were given the same 
instructions for completing the checklists, These instruc- 
tions included a review of the instructions displayed on the 
checklist (see Appendix II), and personal instructions on 
the degree of value judgment allowable. It was indicated, 


and stressed, that if a behavior described on the checklist 
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40 
had been observed, during the observation period, it was to 
be recorded regardless of frequency. 

(a) Raters. The subjects were rated by their regular 
teachers after the two month observation period, Instruc- 
tions to each rater, as noted above, were the same in form 
and content. All raters were volunteers who agreed to assist 
in the study. Because of the voluntary nature of rater 
participation, these people may not have been EOtal ly, 
representative of all teachers in the local school system. 
The assumption is made, however, that they are similar to 
local teachers generally. 

(b) Data Collection. The data used for analysis were 
collected from teacher ratings of pupil behavior. These 
ratings were used due to the fairly comprehensive knowledge 
that these teachers possessed regarding their students' 
behavior and because teachers, generally, have been shown to 
be reasonably accurate, according to the literature, in 
their perceptions of student behavior. Outside observers 
were not employed because their observation period would 
necessarily have been less than the continuous two month 
period and this fact could have seriously confounded any 
results obtained. Furthermore, the instrument was designed 
as a device to be used by teachers, and outside observers 
would have rendered the results meaningless within the 


framework in which the instrument was designed. 
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41 
Checklists were in the schools for an average period 
of just over one week and, when completed, were collected 


for analysis. 


Data Analysis Techniques 

1. Preliminary Analysis. In addition. to analyses 
performed to test the hypotheses, a Here enad analysis was 
performed to determine the influence of nested variables. 
Each group from each school was examined with a total of 140 
subjects from the original groups examined, (10 groups of 
Fetrcomeeaci ot the Larger groups), in order to determine the 
extent of a nested school variable and the extent of a 
nested teacher variable. The procedure used was similar 
to that described by Winer C1962) involving ieee: 
Vaeilance: of Gach individual class grouping, ‘each school 
grouping (2 classes combined), each of the two main groups 
and the determination of error variance which was used as 
the basis fro comparison. In this way, the confounding 
effects of schools and teachers were determined as well as 
the treatment effect (extent of real group difference) and 


the variance attributable to error. 


2. Hypothesis #1. The means and standard deviations 
(for calculating variance) were computed and compared for 
each group. The means were compared using a two-tailed t- 
test between means of independent samples while an F-test 


for differences between variances was used to determine 


variance differences. AS noted in the hypothesis, differences 
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42 
were sought beyond the .01 level of confidence. 

3. Hypothesis #2. The mean on each factor was computed 
for each group and compared using a t-test between means 
of independent samples. Differences, again, were sought 
beyond -the? .01 -level+rof iconfzdence. 

4. Hypothesis #3. A frequency polygon was plotted 
showing the number of subjects in each group at the various 
score values (0 recorded, 1 and following scores recorded in 
intervals of *4)ancdThe eriticalcarea of overlap was deemed to 
beé iat «score value 2b:(T score 60))*which Walker (1970) indica- 
ted was the point allegedly separating normal from problem 
behavior. 

5. Hypothesis #4. Frequency polygons were plotted on 
each of the sets of scores on .each-of the five factors. 

The critical score in the separation of problem from 


non-problem bahvior, on each factor, is as follows: 


(I) mACEING Ove (Scale I) Between 7 and 8 
(2) Withdrawal (Scale IT) 5 
(3) Distractability (Scale IIT) 6 
(4) Disturbed Pear 
Relations (Scale IV) 3 


C5) imma bt y (Scale V) Between 2 and 3 


6. hypothesis 45. The 47 male subjects in Group I were 
matched for age with 47 randomly selected boys in Group II 


(selection being random within age limits). These subjects 
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43 
ranged in age from 8 to 12 years of age. The means for each 
group were computed and differences noted. As previously, 
differences significant beyond the .01 level of confidence 
were sought. The analysis was to have been performed using 
female subjects, but with only 15 females in Group II, it 
was felt hae insufficient numbers, here, would not yield 
meaningful results. 

Jreahy pot hesi swiped © OApprecedure Mident ical ito tthat for 
hypothesis #2 and #5 was performed on the individual Factors. 
Differences were sought beyond the .01 level of confidence. 

Semelost HoceAnalyses 

lajmeimternal Consistency. The internal consistency 
of veddh of the two groups, and the two groups combined, 
was estimated using the Kuder Richardson 20 correlation 
method (see Ferguson (1971) p. 367). This method yielded 
an overall estimate of the degree of internal consistency 


within the groups and with groups considered as one. 


(b) Matched Group Comparisons with Walker's (1970) Group. 
Group II subjects were compared with a group of subjects 


used by Walker (1970) to estimate contrasted groups validity 
The 94 group II subjects and 38 subjects identified by 
Walker as behaviorally disturbed were compared using a two- 


tailed t-test in the comparison of means and an F-test in 
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44 
the comparison of variance. This was done to determine if 
any similarity in mean scores for these two groups existed 
and if they could be considered as homogeneous in terms 
of variance, 

Ce) Comparison of Matched Groups to Source Groups. Using 
a t-test, the two sub-groups, matched for age and sex, were 
compared to their source groups. Means were compared to 
determine if the sub-groups were similar to the groups from 
which they were drawn. Obviously these groups should have 
been representative to some extent and this analysis was 
performed to determine if that representativeness did, in 
fact, exist. 


(d) Correlation of Group II Scores to Age and 
Length of Time in Adaptation Classes, 


Age and length of time in Adaptation Classes were 
correlated for Group II. This correlation was computed to 
determine the relationship, if any, of these two factors 
to overall checklist scores. Additionally, this analysis 
could, it was felt, give some indication as to the 


effectiveness of the Adaptation program. 
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V RESULTS 


Preliminary Analysis 


Hierarchal Analyses. As a result of the rather 
complex structure and nature of the groups involved in this 
study, a number of potential sources of variance emerged. 
These sources of variance included variance due to real 
differences between the groups, variance due to school 
differences, variance due to differences in treatment or 
approach within the schools, variance due to teachers in 
schools being involved with different groups and variance 
within groups (the measure used as a basis for comparison). 

A Hierarchal analysis was conducted on the full 
checklist and on each of the five sub-scales, following a 
method similar to Winer (1962). 

The results of the Hierarchal analysis of the full 
scale are presented in Table l. 

In this analysis, 2 significant (beyond the .01 level 
of confidence) sources of variance emerged. A moderately 
significant teacher within treatment by schools effect was 
Noted. SLi sy eLrect would seem to indicate that some of the 
variance in checklist scores was due to variations shigi fetehe 
teachers dealing with the children in the two different 
settings, i.e., Adaptation class vs regular class. The reasons 


for this will be dealt with in the discussion. The teacher 
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Table l 


Hierarchal Analysis - Full Scale 


Ce ee i 


mpl s ae Deine wok Oe ee ee ee ee eee eee 


Source of Sum of Degree of Mean F Critical 
Variance Squares Freedom Squares Ratio (,01) 
pee ee eee ee 
Treatment 5136.45 i 5136745801594" 6.84 
School VOI. 58 4 27a eeo Boa2 3,47 


Teacher by 
SCcnoo! 689.67 4 LP 2e42 2,08 3.47 


Teacher within 
Treatment by 
School 2400,69 10 DAO RO AX stehele 247 


Within Sse 102) 2 Ose 82,93 
a CL als tS 


* Significant beyond .01 level of confidence 
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47 
within treatment by school effect is too large to totally 
ignore and must be noted as a potentially confounding effect. 

In the analysis of variance on the full scale, the 
main source of variance is that resulting from what has been 
labelled the treatment effect. This effect is a reflection 
of differences between the two groups, since the two group 
variances are compared. The ratio of dotlanee attributable 
real differences compared to within groups variance, at 
61.94 (crr. 6.84) iS significant well beyond the -01 level 
of confidence. This effect would, as a result, appear to 
overwhelm the significant though moderate teacher within 
treatment by school effect noted above. 

The results of the hierarchal analysis of Scale I 
(Acting-Out) are presented below in Table 2. The only 
significant (beyond the .01 level of confidence) source of 
variance evident on Scale I was that attributable to 
differences between the groups of subjects. It appeared 
from these results, that a real difference existed between 
the groups on ee factor, 

Table 3 includes the results of the hierarchal 
analysis on Scale II (Withdrawal). As in Table 1, two 
sources of variance were noted. Also like the results 
found in Table 1, the significant sources of variance were 
the treatment (differences in class type) effect and the 


teacher within treatment by school effect. In this case, 

























bb re is 
yifsdod ot apxel oot 2 ere ete 
-toe2te pribavetiaos vitpianiedog 6 as basen 4 sale ob 
eft ,aisoa Sint efis to soaniiaie Yo) nna ea “a 
sed esd sarlw 033 faut.t Ivewt padi et. endehaw 20. >: ae abe ' 
noisoette: 4 wh Jogits ard Hoste snomsonns ae, Hatta 
QWwexp owt ont sonic adi ak fe ow sit lie tain . 
sitisfbdicatda’ sonsitéev Fah mck ari? -boaeqmon oun 


a oie cen 






level 10. oft bnoyod lita fneo83 1 iniphe. af aa 75.) a : 


ot tsoccé td ivueex Byes (Bivow dpatite abe sone bt ae ie ce 


wn 


atdtiw Yeionas Sa atobom Apvor Jnspiteagda ads atedwaevo a 


7 a 

oveds Retort’ hosts Loodae “a Snomisoet P 
L sle3p.-10- eteylens kodtaeioiit: arid te e4tugox® “aa Vesa 
hal Soe 


vino sd? .S oitheth (ge woe pehiideatial 936 (4u0-pattoa) = 


76 Sotuoe (Sonebrifies Fe leva. EO. ‘od proved) enact i 
ot sldstudi-ses Herd esw I alana STO tnabive: ee 
betsoucs 42  .evoaidwe Ye Iequoxp ond neowied soonsweIhs 
neevded bedeiizse connate cottit Sox @ cheats yatiuees osods mort : 
/xotpad adds ao set, ao 

fators2sid sig te ot Lumet oats awooilort é eldstT 

ows vi ofden at on aa af. ee dren 
ayant Sit oT: oat cats Ay 


a ip Erte ig. a] 
oad Pe al Peal h Sd 





o io 7 
i” Ae | a 





48 


Table 2 


Hierarchal Analysis - Scale I 


Pn sac i a a ee 





SE Ee SERA REE Re a SE EI SET a I ROO Se ened 


Source of Sum of Degrees of Mean F Critical 


Variance Squares Freedom Squares Ratio (.01) 
os. 2he wre ES lo oie See es eee 
Treatment 1131.46 ik LISI. 461395 64% 6.84 
School 99.76 4 24.94 87 32847 


Teacher by 
School Ta Le 4 17.54 7OL 3.47 


Teacher within 
Treatment by 
School 457.42 10 451.745 1260 2.47 


Within 3425.14 120 285, 54> 
ee se ee ee CE een 


* Significant beyond .01 level of confidence 
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Table 3 


Hierarchal Analysis - Scale II 





Source of Sum of Degrees of Mean F Critical 
Variance Squares Freedom Squares Ratio (.01) 
Treatment ya oss I il Eid bysel byl hea oN eta he Ni 6.84 
School See!) 4 Oe 6 es. 00 3547 
Teacher by 

School syle 4 1,41 aoe 3.47 


Teacher within 
Treatment by 
School IRIE eye | 10 Lin WS 2.6 4* 2.47 


Within 533 eA 120 4.45 
aio ee en ea 2 ee a ee ee ees 


* Significant beyond .01 level of confidence 
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50 
however, the teacher within treatment effect must be given 
considerable attention as the treatment effect was 
proportionately much less extensive here than in the full 
Scale, , It appears: that although the treatment effect 
appeared greater, it was seriously confounded by the 
teacher differences. The effect of the differences between 
classes (treatment) could best be described, here, as 
tentative because of the confounding teacher variance. 

Table 4 shows the results of the nd econ analysis 
of Scale III (Distractability). As in Table l, a significant 
treatment effect occurred in regard to Scale III. Other 
potential sources of variance were not significant beyond 
.01 level of confidence. It appears that real differences, 
between the two groups, exist in terms of this scale, The 
rather pronounced F ratio between treatment effect and 
within groups effect suggests that in terms of this scale the 
groups are considerably different. | 

On Table 5 are presented the results of the hierarchal 
analysis See eCenaA on Scale IV (Disturbed Peer Relations). 

A significant treatment effect (beyond .01 level of 
confidences) was obtained on this scale and although not 
significant beyond the .01 level of confidence a moderate 
teacher effect was noted. The treatment effect appeared to 
be sufficiently powerful to account for the greatest amount 


of the variance on this scale. 
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Table 4 


~Hierarchal Analysis - Scale III 


nT 





Source of Sum of Degrees of Mean F Critical 
Variance Squares Freedom Squares Ratio 
Treatment 244.47 Ay 244,47 34,82* 6.84 
School 78.50 4 1 G62 2.18 3.47 


Teacher by 
School B:0ar9 2 4 AO 28) 2388 3.47 


Teacher within 
Treatment by 
School 145.78 10 Hid ee) ee 2 OF 2 ti 


Within 842.85 120 7 O02 





* Significant beyond .01 level of confidence 
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Table 5 


Hierarchal Analysis - Scale IV 


Source of Sum of Degrees of Mean F Critical 
Variance Squares Freedom Squares Ratio eG.01) 
Treatment O25 82 1 O28 35 2454 52 6.84 
School (Meh s Ps 4 Soe 94 3241 


Teacher by 
School 2780 4 6.77 mS Bay 


Teacher within 
Treatment by 
School 142.99 10 14.30 pace) 2.47 


Within TAT AS 120 5.98 


* Significant beyond .01 level of confidence 
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Table 6 presents the final hierarchal analysis 
performed on the checklist and was performed on the fifth 
factor (Immaturity) . As in the previous tables, the 
treatment effect appeared as the main source of variance, 
being significant beyond the .01 level of confidence. None 
of the other potential sources of variance differed signif- 
icantly from the within groups variance measure. The 
difference between Groups I and II on the fifth factor 
(Immaturity) appeared to be considerable and real. 

A treatment effect was noted on all five factors of 
the checklist as well as on the Full Scale. In none of the 
scales was there a significant (beyond .01 level of confidence) 
school or treatment within school effect. That is, when 
variances by schools were compared, and when variances by 
GEOup Within Lhe schools were compared, there was not a 
significant effect on the overall variance of the groups. 
Moderate but significant teacher effects were noted on the 
full scale of the checklist and on Scale II (Withdrawal). 
Although this teacher effect did not seriously compromise 
the treatment effect on the full scale, because of the 
magnitude of the treatment effect, the same cannot be said 
for Scale: Li... sthe relative value of statements regarding 
this scale must be weighed against the effect of variations 
among teachers in rating subjects on this scale, It appears 


that very few definitive statements can be made regarding the 


discriminate powers of this scale vecause of the teacher effect. 
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Table 6 


Hierarchal Analysis - Scale V 


a a a aaa aa 


Na a a aa aaa TaD 


Source of Sum of Degrees of Mean F Critical 
Variance Squares Freedom Squares Ratio (.01) 
a brett eet Oa ae sleet aa a lf 
Treatment 51.60 ul B60) 13.54" 6.84 
School 30533 a Tee eo? 3.47 


Teacher by 
School BG ga 4 9705. 22.3% 3.47 


Teacher within 
Treatment by 
School 1A e 3.6 10 4.345 1.14 2.4] 


Within 428,29 20 3.81 
Vie pean of the 9s Sub aot s 4h Ue 


* Significant beyond .01 level of confidence 
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i Hypothesis #1. It was hypothesized that the means 

of the two groups currently under study would show differences 
beyond the .01 level of confidence. According to Ferguson 
CLOT ye Dp: 218), the performance of this analysis and the 
hierarchal analysis may have Se eaees redundant. This 
analysis encompassed all subjects under study, not merely a 
selected group, in order to yield an. accurate picture of the 
extent of differences between the groups. 

The mean for each group on the full scale was 
computed and compared using a t-test for means from indepen- 
dent samples. Table 7 (p. 56) presents the results of the 

the full scale mean of Group II (Adaptation) was 
significantly different from that of Group I (regular), with 
differences significant beyond the .001 level of confidence. 
The mean of gel eye subjects in Group II appeared very similar 
to that of Walker's (1970) group of 38 experimental subjects 
used to determine contrasted groups validity. This group, 
like the Adaptation group in the current study, met certain 
criteria (see Walker (1970), Pp. 3) to differentiate them from 
the control group used for a comparison. It was this apparent 


similarity which prompted certain further analysis which will 


be discussed later. 
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Table 7 


Comparison of Means 











Scale Group # Mean . T-value Critica 
(di = 186) Value 
Fold I 4.95 u 
~8.261*** 37291 
TI 167 : 
in I 1.28 - 
ae -7.097%** Bead. 
II ee We wie 
I I 64 a 
1 ‘ EO USB aes 2,576 
TT 1,49 = 
TIL I tare a 
-~6.970*** 3,291 
II 4.59 = 
IV I 63 ale 
: ~4.346%% nee 
II 2.20 
Vv 67 + 3.291 
T Bal aay ae Waey 
nd eg 


*** Significant beyond .001 level of confidence 
** Significant beyond .01 level of confidence 
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Table 8 


Comparison of Variance Full Scale Only 








Group Standard Variance F Cra i cade natd.C 





Deviation (S) (S2) F( 99) (93.93) 
II 11.93 142.40 
2.95** 1.59 
I GaD A Bre? 7 


** Significant beyond .01 level of confidence 
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The variances were also compared, on the full scale, 
in order to determine the homogeneity or heterogeneity of 
the two groups. The data used and results obtained are 
presented on Table 8 (p. 57). This comparison was made 
using all the subjects, rather than the 140 subjects used in 
the, hierarchal analysis to determine: if differences in 
numbers substantially altered the variance. 

The variances of the two groups showed a difference 
Significant beyond the .01 level of confidence. ace 
results. showing ee the variance of Groups I and II were 
heterogeneous, and the apparent similarity of Walker's 
experimental group to Group II, prompted further investiga- 
tion. The results of that investigation will be presented 
later. 

2.  jHypothesis, #2... Table, 7 ((p. 56). presents the, compar- 
ison of group means on the sub-scales of the WPBIC. All 
subjects were used in these comparisons. 

On Scale I (Acting Out) differences between Groups I 
and II appeared ahatker were significant beyond the .001 level 
of confidence. It was on this scale, measuring the extent of 
acting out behavior, that the greatest difference between 
the groups was noted. 

The analysis of Scale II (Withdrawal) means showed 
that differences beyond the .01 level of confidence Sy sued 


between the two groups. Caution in attempting to interpret 
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these results appears necessary because of the findings of 
the hierarchal analysis which will be presented later. 
Teacher effects confounded the results on this scale, so 
that, although Hier er ence beyond the stipulated level of 
confidence were found, these differences may have been 
caused by other than real differences between the groups. 

Mean differences between groups on Scale III 
(Distractability) were found to be beyond the .001 level 
of confidence. Based on this finding it would appear that 
Group I was significantly less distractable than Group II. 

Means on Scale IV (Disturbed Peer Relations), when 
compared, showed that Group II's mean score on this factor 
was Significantly higher (beyond the .001 level of confidence) 
thar Ghat’ of Group 1, 

Differences beyond the .001 level of confidence 
were also found between the group means in Scale V 
(Immaturity). 

The comparison of means showed that on all five 
sub-scales, as well as on the full scale, Group II, had higher 
mean scores than did Group I, and that the difference was, 
in all cases, significant beyond the .01 level of confidence. 
All but the Scale II differences were significant beyond the 
-001 level of confidence. 

Scale II (Withdrawal), as the poorest indication of the 


scales presented here, still appeared to be a good indicator 
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of problem behavior along the dimension of withdrawal. 
However, as will be noted later, its ability to discriminate 
may have been weakened by outside factors and may not have 
been as adequate a scale as these initially presented data 
would suggest. 

ae eenypothesis wo. ~ The distribution of the 188 subjects 
on the full scale and sub-scales of the WPBIC were plotted 
and the amount of overlap of each group was shown, The 
percentage of misclassification was also calculated for each 
group. "Misclassification", in this case, was taken to mean 
the rating of Group I subjects above the respective critical 
scores as defined by Walker (1970) on the full scale and 
sub-scales, and the rating of Group II subjects below these 
critical points. This does not, necessarily imply that 
thse subjects were erroneously rated. 

The distribution of subjects on the Full Scale is 
shown on Figure 2. In order to keep this graph from becoming 
unduly awkward or crowded, all scores above 0 were grouped 
into intervals of 4. Therefore all subjects rated between 
1 and 4, 5, and 8, etc. were grouped together. In Group I, 
4 subjects were scored above a score 6iecl pothe point 
established by Walker (1970) as the dividing point between 
problem and non-problem behavior. This represented 4.26% 
of Group I. 34 subjects, representing 36.17% of Group II 


were rated at or above 21. This left 63.83% of the group 
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Figure 2 - Distribution of Full Scale Scores 





62 
below this point. 

4. Hypothesis #4, Figure 3 presents the distribution of 
subjects on Scale I. On this scale 4 subjects or 4.26% of 
Group I were rated above a score of 7, which is the critical 
score, according to Walker (1970). Of Group iGie 39 336% 

(37 subjects) were t See this point leaving 60.64% of the 
group below this score. On the graph used for Figure 3, 
each score is represented and the number of subjects at 
each recorded. This procedure was also followed on the 
Pomainine graphs. 

Figure 4, presents the distribution of the subjects 
on Scalegit @Withdrawal).) On this scale 5.32% (5 subjects) 
in Group I were rated at or above the critical score of 5. 
Of the subjects in Group II, 14.89% (14 subjects) were scored 
at or above the critical score with 85.11% below this score. 
No clear discriminatory ability can be claimed here because 
of the confounding teacher effect which has affected the 
confidence placed in this scale on all of the analyses 
performed. 

On Figure 5, 8.51% (8 subjects) of Group I were 
rated at or above the critical score of 6 while 37.23% 

(35 subjects) of Group II subjects were scored above the 
critical score on Scale III (Distractibility). This left 
62.77% of Group II below this critical score, 


Figure 6, presenting the distributions on Scale IV, 
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67 
(Disturbed Peer Relations) shows 8.51% (8 subjects) of Group 
I at or above the critical score of 3. On this scale 34.04% 
(32 subjects) in Group II were at or above the critical score 
of 3 while 65.96%. of the subjects were below this score. 

Finally, as shown on Figure 7, 10.64% (10 subjects) 
of Group I scored above the critical score of 2 on Scale V 
(Immatunitey): Of Group II, 31.91% (30esubjects) of. Group II 
were rated above the critical score, while 68.09% of this 
group were rated below the critical score. 

In looking at these graphical results, it can be 
noted that on each scale, the number of Group II subjects 
above the critical score substantially exceeded the number 
of Group & subjeces: Onescale I (Acting Out) 9.25 times as 
Many subjects in Group II were rated above the critical score 
as were Group I subjects. The ratios on the other score 
ranged between this ratio and a 2.79 Group I. to Group I 
ratio on Scale II (Withdrawal), the weakest scale in terms of 
ability to discriminate between groups. The remaining Group 
Di -to' Group 2 ratios, including that for the full scale were 
Sb miler on the stulleccate 4.375: J for Scale Jil (bistrac- 
tability), 3.99 for Scale IV (Disturbed Peer Relations), and 
2.99: 1 for Scale.V (Immaturity). The ratios presented 
above were, to reiterate, the ratios of Group II subjects 
rated at or above the critical score for a scale, to Group 


I subjects above that score, 
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Frequency: Intervals of 2 with interval of 36 from /2 to 48 
and interval of 24 from 50 to 74 


Figure 7 - Distribution of Scale V Scores 
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5. Hypothesis #5. From each group, 47 subjects were 
matched for age and sex. Male subjects ranging in age from 
8 to 12 years were compared, with equal numbers from each 
Group at EOTUCEC. level. The mean scores on the full scale 
were compared for statistical significance using a two- 
tailed t-test with differences sought beyond the -01 level 
of confidence. The results of this comparison are presented 
in Table 9. 

Difference , significant beyond the 001 level of 
confidence, was found between the full scale means of the 
groups used in this analysis. These results showed a 
pattern similar to those achieved using the full groups 
although the matched group means were somewhat higher than 
those of the full groups. 

6. Hypothesis #6. Using the scores of the subjects 
mentioned, above, the mean scores on each scale were 
compared for statistical significance beyond the .O1 level 
of confidence. A two-tailed t-test was used as above. The 
results are presented in Table 9. 

Differences, Significant beyond the .001 level of 
confidence, were noted on the means of Scales I (Acting Out), 
III (Distractability) and IV (Disturbed Peer Relations). On 
Scale V (Immaturity), a difference between the matched group 
means was found which was significant beyond the .05 level of 


significance. This difference was below the confidence level 
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sought. Scale II failed to discriminate between the groups 


even at the .05 level of confidence. 


Post Hoc Analyses 

(a) Internal Consistency. Separate estimates of 
reliability were obtained for each group. The results of a 
Kuder Richardson Split-Half reliability measure, similar to 
that performed by Walker, but where the relation between all 
possible pairs was obtained, yielded a reliability coefficient 
of .81i7 for Group I (regular class) and .8143 for ,Group. £1. 
Also, a similar measure was obtained for the pooled groups 
to determine the level of Rerete crcy of the raters as a 
group. This calculation yielded an overall reliability ohm 
.8598. This was not an entirely satisfactory estimate of 
reliability as the 20 raters were considered as 2 raters 
when the estimates were made for the groups separately and 
as one rater when the general measure was taken across the 
groups. 

The overall estimate of internal consistency across 
the groups suggested that approximately 86% of the variance 
obtained in the checklist scores was true variance while 
approximately 14% of the variance was due to error (see 


Ferguson (1971, p- 365). Walker (1970) cited Lindquist 
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Wiens relvabr ity=coetficient of 298, the 
checklist is capable of making individual 
separations among subjects with a considerable 
Meqree OrecelIamility as an r of .90 1s the 
minimum coefficient acceptable for the purpose 
ele ees 


However, Lindquist (1950) in discussing the levels On 
reliability based on the guidelines set by Kelley (1977) 
said: 


: making the assumption that for a test to 
be useful, it must permit discriminations of a 
difference as small as 0.26 times the standard 
deviation of a grade group with chances 5 to 1 of 
being correct. Kelley arrives at the fol lowing as 
the minimum correlation for several purposes. 

. . (b) To evaluate differences in level of 

group accomplishment in two or more performances 
| a: +) eee ee a ee : gre So eee hess 
Tt must be recognized, however that these values 
are arbitrary, being derived from the above 
assumptions as to what would be reasonable to 
expect a test to do in the way of discriminations 
between individuals and groups (p, 609). 


Walker (1970) achieved a reliability coefficient of 
.98 using 534 sets 32 Gee In examining each of 
the groups separately, a mean reliabilityeoL about 813 was 
estimated, while the combined group reliability coefficient 
was .8598, using all 188 observations. As the sample size 


increased, so did the estimate of reliability... As) Ferquson 


(19 Ji.) points Ou: 


Low reliability does not necessarily invalidate 
a technique as a device for drawing valid inferences. 
Low reliability may be compensated for by increasing 
sample size’s "> , VP “When signifticant results are 
reported with an unreliable technique on a small 


sample, the treatment applied is usually exerting a 
gross effect (p. 373). 
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An overall reliability of .8598 cannot be considered 
especially low. Neither did it approach Walker's coefficient 
of .98. Observation of the trend of increase of the 
reliability coefficients from that of the two groups 
separately to that of the groups pooled, suggested that with 
an increase sample size, the coefficient of reliability would 
increase to a point approximating Walker's figure. The value 
of increasing the sample sizes for that purpose would have 
provided little real benefit in terms of increased relia- 
bility since the coefficient obtained was already of a 
fairly high order. 


(b) Matched Group Comparisons with Walker's (1970) 
Experimental Group 


The apparent similarity between Group II mean and 
that of Walker's (1970) experimental group of 38 behaviorally 
disturbed subjects prompted a statistical comparison of 
those two groups. The results of that comparison are 
presented below. A two-tailed t-test for differences of 
independent means was used following Ferguson (1971, p, 152). 

As can be observed from Table 10, no significant 
difference appeared on this comparison, It appeared that 
Walker's (1970) group of experimental subjects was similar 
to the group of 94 subjects in the Adaptation class group 
used in the current study. No comparison of this sort was 
possible on the five sub-scales as those data were not 


presented in Walker (1970). 
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Table 10 


Comparison of Means of 38 Subjects from Walker (1970) 
and 94 Junior Adaptation Students 


Scale Group Mean T.value Critical 
(di = 130) Value 


Full Walker (1970) 
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For comparative purposes, the variance of the 
Adaptation group in this study and that of Walker's (1970) 
experimental group were also examined for statistically 
Significant differences. The results of that comparison 
are presented in Table 11. The difference in variance 
between the two groups were not statistically significant. 
As in the case of the means, Walker's (1970, p. 3) experi- 
mental group appeared quite similar to the Adaptation group 
used in the present study. 


(c) Comparison of Matched Groups to Source 
Groups I and II 


In order to determine if the two groups matched for 
age and sex were similar to or dissimilar to the groups 
from which they were drawn, the means on the full scale of 
the matched groups were compared to the various means of 
the respective groups from which they were drawn. As 
previously, a two-tailed t-test was used for comparison 
and the results are presented in Table 12. 

The results presented on Table 12, indicated that 
the two groups Ahaen were matched showed no significant 
differences from the respective source groups. As noted 
previously in regard to the matched groups, the WPBIC could 
not distinguish between the groups in Scale II and could do 
so only at a low level of confidence (.05) on Scale V. 


(d) Correlation of Group II Scores with Age and 


Length of Time in Adaptation Classes 


A simple correlation was performed between checklist 
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Table 11 


Comparison of Variances of 38 Subjects from Walker (1970) 
and 94 Junior Adaptation Students 
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Table 12 


Comparison of Means - Matched Group Means 
to Source Group Means 


ec a a me 
ainsi a Teen ene es eS 


Scale Group # Mean T-value Cragica | 
(df = 139) Value 
ou 
Full I - Full (N=94) 495 
TOMS echoed mo. 0 Zn 6 (02) 
(N=47) 6.26 
Full II - Full (N=94) 16.71 
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78 
total scores, age and the length of time each student in 
Group II had been in a Junior Adaptation class. The mean 
total checklist score was 16,702, the mean age was 128,574 
months (standard deviation 13.356) and the mean length of 
time in Adaptation classes was 15.308 months (standard 
deviation ‘8.'453)’, 

The coefficients of correlation obtained showed 
tendencies toward negative correlations although they were 
not of sufficient magnitude to warrant a great deal of 
emphasise | A correlation CoOeificient of ~.,143 was obtained 
between the checklist total score and age, while a COGELICient 
of -.054 was obtained between the checklist total score and 
length of time in the Adaptation program, These results 
would indicate that in Heeca se of the present group under 
study, there is a minimal relationship between age and 
behavior as rated on the WPBIC and length of time in 
Adaptation classes and behavior as measured on the WPBIC. 

To summarize briefly, it was found that Group I and 
Il differed significantly on the full scale of the WPBIC and 
on all of its sub-scales. A variance estimate using all of 
both Groups revealed that their variances were not 
homogeneous. 

The graphed distributions of the subjects showed 
substantial differences in numbers of subjects above the 
critical scores on all scales except, perhaps, on Scale II 


where the differences in numbers though present were not 
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79 
clearly dissimilar. Only 14.89% of Group II scored above 
the critical score of 5, on Scale II (Withdrawal) while 
5.32% of Group I were rated above this point. According to 
Walker (1970), between 10% and 20% of a regular group could 
be expected to have behavior disorders, and, if true, even 
Group II would have been similar to a regular group on 
this scale. 

When 47 boys from each group were matched for age 
and compared on the full scale and all sub-scales, it was 
found that differences beyond the .001 level of confidence 
existed on all scales but Scale II and Scale V revealed 
differences significant beyond the .05 level of confidence; 
Scale II (Withdrawal) showed no significant differences 
between the matched groups. The groups, later determined to 
be generally similar to their source groups, generally 
followed the pattern of other analyses in that Scale II 
appeared to be the weakest scale in terms of ability to 
discriminate between groups. Scale V, with differences 
beyond the .05 level of confidence was also less able to 
discriminate between groups in this analysis than in other 
analyses. 

It was determined that differences between Groups I 
and II account for the greatest amount of the variance on 
fhemrtullescalevandescates. 1, Lit, 1V, andeve On therrull 


scale and Scale II, a teacher rating effect beyond the .01l 
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level of confidence was also observed. Because of the 
overwhelming effect of Group differences on the full scale, 
the teacher effect, which was significant but low by 
comparison, was de-emphasized. However, because of the only 
moderate significance of the group differences on Scale II, 
the teacher effect was considered to be of sufficient 
magnitude to compromise the value of that scale in determining 
the presence or absence of Withdrawal type behavior. 

A measure of internal consistency was also estimated 
indicating that approximately 86% of the variance obtained 
in checklist scores was due to real variance while the 
remaining 14% could be attributed to error. 

When the members of Group II were compared to 
Walker's 38 experimental subjects it was found that Group II 
and Walker's group had similar means and variances. Also 
when the matched groups of the present study were compared 
to their source groups, they were found to be similar to 
the groups from which they were drawn, 

A correlation between age, length of time in 
Adaptation classes and overall checklist scores in the case 
of Group II showed minor negative coefficients between 
scores and age and scores and length of time but these did 


not prove to be significant. 
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VI DISCUSSION AND CONCLUSIONS 


Discussion 

This examination of the Walker Problem Behavior 
Identification Checklist has revealed certain strengths and 
weaknesses in the checklist which will be discussed in some 
detail. | 

The comparisons of mean scores of the two groups 
suggest that real differences existed between the two groups 
of subjects in the present study. With mean score differences 
Significant beyond the .001 level of confidence in the full 
scale as well as’on’Scales I (Acting-Out), III (Distracta- 
bility), IV (Disturbed Peer Relations), and V (Immaturity) 
with the higher mean score always associated with Group II, 
it appears that the instrument has detected differences, 
between the Groups, in the nature of manifest behaviors, and 
was useful in detecting instances of Acting-Out, Distracta- 
bility, Disturbed Peer Relations and Immaturity, as well as 
differences between the Groups in terms of overall behavior. 
A similar claim cannot be made for Scale II (Withdrawal) 
despite the fact that mean scores differed beyond the .01 
level of confidence, uscHlie II, as noted earlier, was subject 
to a rather serious teacher effect which compromised much of 
the interpretative value of this scale. 

With the above noted exception the instrument 


appeared quite strong in its ability to distinguish between 
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82 
groups. 

The analysis of variance using both full groups 
indicated that the two groups were not homogeneous and 
suggested the possibility that the groups could represent 
different populations. 

From the comparisons of means and variances, the 
evidence tends to support hypotheses 4 and #2. 

When the hypotheses for the results of graphing the 
scores were developed, it was stated that a "misclassifica- 
tion" of greater than 15% for either group would raise 
questions about the value of the instrument. Within that 
frame of reference the value of the instrument is in serious 
doubt because the percentage of misclassified subjects in 
Group II ranged from 60.64 per cent for Scale I and 85.11% 
for Scale II with an average of 67.73% of subjects being 
"misclassified." This problem did not arise with Group I 
where percentages of "misclassification" ranged from 4.26% 
for the Full Scale and Scale I to 10.64% for Scale V with an 
average of 6.92% of the subjects being "misclassified." 

According to Walker (1970) up to 10 to 20% of 
students can be expected, in the regular class, to display 
serious behavior disorders, while Freehill (1973) states: 

From reviewing the data, White and Harris (1961) 

concluded that a figure for a mild disturbance was 
impractical but a working estimate for serious 
maladjustment was between 2 and 12 percent. A 
practical and widely used rate comes from Bowen 
(1960) in California: 10 percent emotionally 


handicapped and 2 or 3 percent in urgent need 
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83 
The Group I subjects fell within these limits and could be 
considered to be fairly representative of a normal popula- 
tion. This leaves the problem of the "misclassified" Group 
Penns acts to be dealt with. 

Rather than calling the instrument a failure because 
of its leniency to Group II subjects, a-discussion of 
possible reason for this apparent leniency would seem to be 
more appropriate. 

First, it was decided that a normal sample should 
have 15% or less of its number classified as behaviorally 
disturbed if it were to be considered representative of a 
normal population. This condition was met in terms of Group 
I. It may have been more profitable to talk of Group II in 
terms of deviating from normalcy, rather than in terms of a 
prescribed percentage of misclassification. 

Greater than 15% of Group II were below Walker's 
(1970) critical score on all Scales of the Checklist. 
However, with the exception of Scale II (at 14.89% above the 
critical aeOnee all the Scales showed at least 30% of the 
Group above the critical score. This represented between 
2 and 3 times the percentage one would expect to be above 
the critical score if the Group represented a normal 
population. 

How, then, can the large percentage of "misclassified" 
Group II subjects be explained? Part of the role of the 


Adaptation program must be to remediate and control problem 
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84 
behavior. If this were not the case, why would public money 
be spent in the maintenance of these classes, when other, 
less expensive means could be used to simply isolate these 
children? The classes are small, (if classes sampled here 
are representative, no more than 11 or 12 students are found 
in each class, with some classes as small as 6 or 7). This 
allows for considerable amounts of individualized attention 
and a resulting level of rapport between pupil and 
teacher that would not be possible in larger, regular 
classes. Being small, classroom routines and expectations 
can be more individualized and, thus, more appropriate to 
the individual needs of the child. Behavior management 
techniques are possible at a useful level. Frustration can 
be lowered by adjusting educational demands to the pupil's 
abilities. 

It must be remembered, too, that most of these 
students have been with their Adaptation teacher, or in an 
Adaptation class for a period of at least 7 months, (one 
exception being kira im Adaptation only 2°>months) at the 
time of data collection,up to a period of 2 years, 7 months. 
This being the case, if a good percentage of Adaptation students 
were not moving to Ree non normal limits of behavior, as a 
result of the remedial aspects of the program, questions 


about the usefulness of the Adaptation program would ensue. 
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85 
program to re-integrate the exceptional child to the regular 
stream at the earliest possible opportunity. 

Within the limits of the original hypotheses, the 
instrument has failed to accurately discriminate between 
Groups I and II because greater than 15% of Group II subjects 
were rated below the various critical scores on the Checklist. 
Upon closer investigation it would appear that the limits of 
the hypotheses were too restrictive. On all but Scale II, 
more than twice the number expected to be rated as disturbed, 
in a normal population, were rated as behaviorally disturbed. 
That this number was not greater may be more in the way of a 
positive reflection of the Adaptation program, than a negative 
reflection on the usefulness of the Checklist. 

When the mean scores of 47 regular class boys were 
compared with the mean scores of 47 Adaptation class boys, 
it was found that these scores generally showed that the 
groups were significantly different. It was also determined 
that these two sub-groups were similar to the groups from 
which they were drawn. Two exceptions to the general 
differences were noted. 

As before, Scale II was the weakest indicator of 
differences between these groups with no statistically 
significant differences being noted. Scale V (Immaturity) 
showed differences significant beyond the .05 level of 
confidence but not beyond the .01 level. Very little can be 


said about Scale II differences except that here either the 
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86 
two groups were Similar or the teacher variable accounted 
for the low apparent differences. Scale V differences, or 
only moderate presence of differences, may be explainable 
in terms of a widely held notion that girls tend to mature 
earlier than boys, The large number of girls in Group I, 
compared to Group II, may have added to the apparent 
difference between the groups when full groups were compared, 
and the absence of girls in this particular analysis could 
have accounted for the lower order differences here on this 
scale. 

The matched group comparison results tend to be 
supportive of hypotheses #5 and #6. 

As revealed in the hierarchal analysis following 
Winer's (1962) approach, the Checklist was not immune from 
individual teacher differences as was evidenced by the 
moderate but significant teacher effect on both the Full 
Scale and on Scale II, In the case of the Full Scale 
teacher effect, the problem was not acute because of the size 
of the main Brecct differences, i.e. the difference between 
regular class students and Junior Adaptation students, and 
the differences could be discussed as being real despite the 
teacher effect. Such cannot be said about Scale II 
(Withdrawal). The main effect differences, though greater 
than differences due to teacher effect, must be considered 
as being seriously compromised by the differences attributable 


to differences among the teachers. As a result, it is 
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87 
difficult, if not impossible, to attach interpretation to 
the scale, 

The question must be raised concerning the apparent 
weakness of Scale II (Withdrawal), As noted in the results, 
not only was its value questionable as the result of the 
hierarchal analysis, but it appeared as the most inconclusive 
scale on all of the analyses applied to it and the other 
scales. Louttit (1957) shed some light on a probable cause 
for the weakness of this scale: 

Perhaps one of the least disturbing patterns of 
behavior ., . . is that of shyness, seclusiveness or 
withdrawal. Such behavior is generally not regarded 
as a serious behavior problem, as indicated by 
parent or teacher judgments of severity of behavior 
Drop lence eeltmiceots interest | TOsnOce that in 
the cases studied by Martens and Russ (1932), 42 per- 
cent of the problem children and 52 percent of the 
non-problem children were found to be shy and 
bashful. This particular contrast further suggests 
that children who meet situations by withdrawal are 
not likely to be thought of as problems (p, LAER F 

The percentage of subjects in the present study rated as 
seriously withdrawn (5,32% of Group I and 14,89% of Group 
II) came no where near that found by Martens and Russ (1932), 
Even the number and percentage of subjects rated as 
exhibiting any form of withdrawal was relatively low (20 
subjects or 21,27% of Group I and 34 subjects or 36,17% 
of Group), 

Based on Louttit's (1957) comments, the findings 


of Martens and Russ (1932) and the results of this study, 


it appeared that the weakness of the scale rating withdrawal 
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88 
stemmed from the fact that withdrawal is subject to varying 
degrees of perceived severity by teachers and is generally 
not considered to be a particularly serious problem by 
raters, This ee Was further supported by the fact that 
of the 50 items on the WPBIC only 5 measured this particular 
scale whereas between 10 and 14 items per scale were repre- 
sented on the remaining scales. This small number of items 
could also be considered as a source of weakness of this 
scale. Thorndike (1971) and Ferguson (1971) suggest taat. to 
increase the reliability of a test, one can lengthen it. 
Since this appeared to be a problem in regard to Scale II, 
it is conceivable that by increasing the number of items on 
this scale, some increase in its reliability, seen ina 
reduction of teacher related variance, could be expected. 
This, however, would be likely to have little appreciable 
effect on the number of subjects identified as severely 
withdrawn, if the findings of Louttit (1957) and Martens and 
Russ (1932) are true. 

It appears, then, that Scale II, because of its 
susceptability to teacher differences and the fact that 
withdrawal is often not seen as a problem, was a primary 
source of weakness in the WPBIC. 

Aside from the effect of teacher variance on Scale 
II, variance other than that due to real differences between 
groups did not appear to be a major factor. No significant 


differences in variance were shown to be attributable to 
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89 
inter-school differences or differences relating to 
variations of treatment across the schools. In all scales 
except Scale II, the treatment variance (differences between 
Group I and Group II) were overwhelmingly the major source 
of variance. As a result, it is felt that the comments made 
about the value of the WPBIC based on the other analyses 
performed on the instrument are reasonably valid. 

Walker's (1970) estimate of the reliability of the 
instrument, at .98, appeared very impressive, especially 
when compared to the estimate of .8598 for the present 
study. However the difference in reliability may have been 
a function of group size (534 v.s. 188) rather than a 
function of differences between raters and subjects across 
the two studies. As Ferguson (1971, p. 373) suggested, by 
increasing the group size, one can expect an increase in the 
estimate of the reliability of an instrument. Because of 
this fact an estimate of reliability of ,8598 for 188 
subjects may be considered as adequate. The purpose of the 
study was to determine if differences between groups existed, 
so it was, in fact, a group survey. According to Thorndike 
(OT daa: 

If a test is designed for individual diagnosis, 

the test planner may wish to assure a reliability 

of .90, whereas for group survey purposes he may 

tolerate a reliability of .75 to .80 (p. 71). 
With the present size of the sample, the estimate of 


reliability of this study fell between that acceptable 
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for a group survey and that acceptable for individual 
diagnosis. Based on this, the data from the present appeared 
acceptable for discussion on the basis of a group survey. 
Bters togbe Sonera that the WPBIC would not be the only 
criterion used in identifying students with behavior problems. 
In this regard, the estimate of reliability suggested that 
the instrument had some value as a tool in the initial 
diagnosis of behavior disorders. 

The comparison of Group II to Walker's (1970) 
experimental group of 38 subjects yielded rather interesting 
results. It appeared, in those results, that Group II and 
Walker's experimental group could have come from the same 
population because of the close similarity of the mean 
scores and variances. Apparently, the two groups reflected 
the same overall behavior tendencies, a fact that could be 
used to support the generalizability of the instrument across 
populations. The probability of these similarities occurring 
due to chance would be very remote so it must be assumed 
that the Bapvects in the two groups were displaying generally 
the same types of behaviors. 

The two groups matched for age and sex were compared 
to their source groups to determine if, in fact, they could 
be said to represent the source groups. The results 
indicated similarity between matched and source groups so 
statements regarding these groups could be considered to 


apply to the source groups. As noted above, the matching 
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indicated a lessening of differences between Groups I and II 
in terms of immaturity, and, as also noted above, this could 
have been due to the absence of female subjects who may have 
shown nore velative maturity than their counterparts in 
Group I especially. 

The results of the correlation between age, time in 
Adaptation class and checklist scores were somewhat disturbing 
in view of what was said earlier regarding the role and 
purpose of Adaptation classes, The slight negative correla- 
tion between age and checklist scores was disappointing due 
to the Bae tion that as the subjects grew older they would 
tend to show less disordered Eaton. The coerricient fof 
correlation found between these factors was too small at -,143 
to allow for definitive statements. The fact that it was a 
negative correlation ae promising because that trend, at 
least supported the expectation of the direction of the 
correlation. However, it must be assumed that problem 
behavior is not age restricted to any significant degree. 

The eae disturbing result in this analysis, was 
the lack of clear direction or tendency based on the length 
of time spent in Adaptation classes. If the Adaptation class 
were fully meeting the needs of behaviorally disturbed 
children, there should have been a fairly high negative 
correlation between the length of time spent in Adaptation 
classes and the checklist scores. However, with a correla- 


tion coefficient of -.054, although the direction of the 
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coefficient was acceptable, its strength was not. The lack 
of a high negative correlation, here, could create some 
concern as to the effect of the Adaptation classes on the 


students involved in them. 


Conclusions and Implications 

Based on the results and discussion presented here, 
certain conclusions can be drawn. 

First, based on the data, the checklist appeared to 
be able to detect real differences between groups represent- 
ing regular and Adaptation classes. The data supported the 
instrument's efficacy as a group survey instrument. With 
an increased sample size, one could, though, possibly not 
legitimately, claim its usefulness as an individual 
diagnostic tool. The claim made by Walker (1970) of the 
instrument's usefulness on an individual basis may have been 
based on a reliability estimate that was enhanced by group 
size. This is not to say that Walker's claims regarding 
reliability were false, because even with a considerably 
smaller. gqroup than his, a coefficient of reliability of 
nearly .86 was obtained. Even with the lesser of the two 
estimates, that is, the one found in this study, the instru- 
ment appeared to show fairly high reliability. 

Because of the ability of the instrument to detect 
differences between groups, and because of the consistently 


higher mean scores of the Adaptation subjects, it seems 
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93 
reasonable to conclude that differences existed in the 
relative nature of these groups, This statement is made 
with the full awareness that variances among teachers had 
Significant effects on overall variance in the Full Scale 
and on Scale II. In the case of the Full Scale this effect 
was deemphasized due to the size of the Treatment (real 
differences) effect on variance, 

Scale II posed serious problems to the Checklist. 
Based on present data, Scale II (Withdrawal) would seem to 
have limited, if any, value for the purpose it was assigned. 
Observed differences were too small and rater (teacher) 
variations too great to allow this scale to be given any 
degree of real value. Teachers and other raters should be 
cautioned that because of these factors, and the low number 
of items measuring this scale, extreme caution would be 
advisable in interpreting the results of this scale. This is 
not to say that the value of the scale is non-existent, but 
rather, that the results were too inconclusive to merit its 
consideration AG an effective way of determining the presence 
of withdrawn behavior. 

On the remaining scales, the available data suggests 
that the instrument was valid for the purpose for which it 
was designed. On each of Scales I, III, IV and V significant 
differences of a high order were found between the Groups 
sampled indicating that the checklist was sensitive to 


differences along the factors as described. 
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The Adaptation group's similarity to Walker's experi- 
mental group could be construed to lend support to the 
generalizability of the checklist from Walker's group to: the 
present group. The remarkable similarity of means and vari- 
ances here are unlikely to have been due to pure chance. The 
similarity would seem to enhance the image of the checklist 
as a device for detecting behavior problems at the specified 
age and grade levels. 

The results of the analyses comparing age and time an 
Adaptation classes to checklist scores were inconclusive. No 
firm or definitive conclusions regarding these analyses are 
possible. Further research using all adaptation classes 
locally might provide more meaningful results. 

As a whole, then, the WPBIC appeared valid as a group 
survey instrument to determine the presence or absence of 
disturbed behavior in individuals within the classroom. As 
such, it could be useful as part of a screening process to 
ees cera oy behaviorally disturbed pupils. Because Or 
teacher related variance effects and weaknesses present due 
to Scale II, its interpretation should be done in general 
terms, with further study being conducted of each child 
identified as behaviorally disturbed on the checklist. 

Teacher biasing effects must be kept in Mini. eee 
is conceivable that a regular class teacher would seek 
normalcy in her students and an Adaptation teacher would seek 


behavior problems to be identified. This possibility must be 
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oS 
recognized as a limitation of the study, because it could 
conceivably have caused at least some of the differences 
between groups in the study. 

It must also be recognized that a full scale score 
of less than 21 does not mean the absence of problem behavior. 
On any given scale, a score above the cut-off point indicates 
a possible problem in that area. As such the instrument can 
be useful on a scale by scale basis. 

The relationship between academic achievement and 
full scale scores (revised by Spivak and Swift (1973)) were 
not dealt with due to the diversity of teacher recorded achieve- 
ment data. 

Despite the directions for research still open regard- 
ing the WPBIC and its limitations, it appears that the check- 
list does have a degree of usefulness, as an initial screening 
device, that could be of benefit in identifying or helping to 
identify behaviorally disturbed children at the Grades 4, 5 


and 6 level in the local school area. 
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WALKER PROBLEM BEHAVIOR IDENTIFICATION CHECKLIST CWPBIC) 
MANUAL, 


INTRODUCTION 


This manual deseribes the construction, validation, 
administravion and scormp procedures of a behavior 
checklist for the identification of children with behavior 
problems. The checklist 1s designed for use in the 
clementiry grades and is standardized on grades 4, 5, 
and 6 tas consposed of observable, operational state 
ments about classroom behavior which were furnished 
by a representative sumple of elementary school 
teachers The checklist 1s to be used as a supplement 
m the total identification process rather than as an 
instrument to simply classity children as emotionally 
disturbed of socially maladjusted) The WPBIC should 
function as a tool which the elementary teacher can 
rely upon in the difficult task of selecting children with 
behavior problems who should be referred for further 
psychological evaluation, referral, and treatment 


RATERS 


The chassroom teacher is in an unique position to 
identify children with behavior problems since she 
spends much more time im actual observation of the 
child than any other school personnel Research studies 
have demonstrated that teachers are capable of making 
valid judgments about classroom behavior (1, 2, 7. 12) 
The WPBIC consists of items which describe behaviors 


that interfere, or with 


Phe checklist is thus especially 


actively compete, successful 
academic performance 
suited for classroom teachers since according to Beilin 
(1), teachers are most concerned with classroom be- 
havior which ts disruptive of achievement. Since the 
teacher is held responsible for the child's achievement 
through the teaching-learning process, she should be 
an excellent judge of classroom behavior which is in- 
compatible with academic performance The teacher is 
thas regarded as the most qualified rater in using the 
WPBIC to identify children with behavior problems 
who are in need of special educational-psychological 
services. However, ratings from other educational spe- 
cialists such as 


remedial teachers, and 


school psychologists, who have worked directly with 


counsclors, 


the child. can be obtained for purposes of comparative 
analysis 


OBSERVATION PERIOD 


A two month observation period should precede 
ratings of child behavior on the WPBIC A sufficient 
Observation period increases the rehability and validity 
of the teacher's ratings and also reduces the probability 
that high magnitude, yet low frequency behaviors, such 


as stealing, temper tantrums, and fighting, will be 
missed by the rater 

Vhe checklist can be used most efficiently i cach 
child is rated two months after school starts. In this 
way, children who are in need of specialized educa- 
tional services, or those who should be referred for 
further evaluation and treatment, can be identified early 
in the school year Additional behavioral problems 
which may develop in individual children as the year 
progresses can be rated on later WPBIC ratings 


DEVELOPMENT AND STANDARDIZATION 


Source of Items 


The fifty checklist items were drawn from teache) 
descriptions of classroom behavior problems. A randon 
sample of thirty experienced teachers was drawn fron 
the population of fourth, fifth, and sixth grade teacher: 
in a local (Oregon) school district. The teachers were 
then asked to nominate those children in their classe: 
who exhibited behavior problems. Euact 
teacher was then interviewed and asked to describ 
the child’s behavior problem(s) and to give operationa 
descriptions of the behaviors that concerned them 
Observable descriptions of overt behavior were ab 
stracted from each interview, yielding an item pool o 
three hundred items. Fifty of the most frequently men 
tioned behaviors from this sample were selected fo 


chronic 


inclusion in the checklist 


Derivation of Item Score Weights 


A panel of five behavioral scientists was sclectec 
and assigned an item rating task for the purpose o 
deriving score weights for individual scale items. Th 
five judges were asked to rate each item's weight o 
influence in handicapping a given child’s present adjust 
ment. Judges rated each behavioral item’s influence o! 
a twenty point scale ranging from “of no importance 
to “great importance.” The scale was a continuum 0! 
which the judges could rate an item at any given point 
Judges’ item ratings were pooled and averaged. Eac 
item was assigned an arbitrary score weight rangin 
from four to one on the basis of such ratings. Th 
results of the rating procedure and the assignment c 
score weights are presented in Tables 1 and 2 Sine 
the inter-judge reliability (ri) was 83, the means of th 
five judges on all items were pooled and assigned a 
score weights for the scale items. With this weightin 
system, a subject can receive a high score of on 
hundred and a low score of zero. 
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TABLE 1 


Means, Standard Deviations, and 
Inter-Judge Reliability (ri) for 
All Judges on Fifty Scale Items 


| 


Judge Mean Standard Deviation 
] | 118 | 41 
ra 945 3.6 
; 95 7 44 
4 | 11.6 Se, 
5 | iP 85 
Inter Judge Reliability ri, 83 
TABLE 2 
Score Weights, Number, and 
Percentage of Items in Each 
Category 
Mean Score | 
Range Score Weight Number | Percentage 
15.0:16.0 | 4 Gy ul 12 
13,0) 14.45 | 3 8 16 
[eNO eaea | 2B 10 20 
6. oS ! 26 Die. 
Totals 50 100 





Normative Procedures 


Items selected and weighted were incorporated into 
a behavior checklist and given to a twenty one teacher 
sample of 4th, Sth, and 6th grade teachers Phe teachers 
evaluated alf pupils in thetr classes after having ob 
served them for approximately two months in the class 
roomy environment Each subject evaluated on the 
checklist received a marking of either present or absent 
for each item, Phis procedure yielded scores on 534 
children in the 4th. Sth. and 6th grades) The mean 
score for the normative sample was 7.76 with a stan 
dard deviation of 10.53 

hor purposes of screening and identification. if was 


necessary to select a pont within the frequency distribu 


ton (checklist score) which would separate disturbed 
from nondisturbed children with an acceptable degree 
of reliability and validity However, as noted in Figure 
1. the distribution of raw scores was positively skewed 
and did not represent a normal distribution 


FIGURE 1 


Frequency Distribution of Raw Scores 
on Fifty Checklist Items for Grades 4, 5, and 6 
(N= 534) 
160 
140, 
} 20 
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Since the WPBIC 1s composed of fifty negative he 
haviors, a positively skewed distribution would be ex 
pected when the checklist is administered to a regula 
school population. However, ina residential treatmen 
facility for severely disturbed children, the checklist’: 
application could conceivably result ina negiutively 
skewed distribution as high scores indicate possessiot 
of a large number of deviant behaviors. Since be 
havioral adjustment is considered to be normally dis 
tributed in ordinary populations, the raw data on the 
534 subjects were converted into a Pf Score distributior 
(See Table 3) so as to normalize the data and establist 
separation points within the distribution. 


TABLE 3 


Summary T Score Conversion 
Table for the Total Checklist 


T Score Raw Score 
90 50 
80 41 
70 | 3 
60 | 21 
50 | ial 
40 | 1 





A T score of 60, which is the equivalent of on 
standard deviation above the mean, was establishec 
as the point in the distribution for separating disturbe« 
from nondisturbed subjects In using the WPBIC 
subjects who receive a raw score of 21 (1 score of 60 
or above, should be referred for a more intensiv 
behavioral analysis and evaluation. 


RELIABILITY 


The reliability of the WPBIC was estimated by th 
Kuder-Richardson split-half method. The instrumen 
was divided into equivalent split-halves by selecting od: 
and even numbered items for inclusion in the two hal 
tests. In an effort to make the two halves of the check 
list more nearly equivalent and to reduce the respons 
bias which operates when a group of deviant behavior 
cluster together in serial form, items and their equiva 
lent score weights were distributed equally among th 
two half tests One behavior with a score weight « 
four was assigned as item number fifty and anothe 
behavior with a score weight of four was assigned i 
item number one This procedure was duplicated fc 
the remaining forty-eight items by alternately assignin 
score weights of four, three, two, and then one to th 
two halves of the checklist’ The split-half reliabslit 
coefficient obtained on the checklist was 98 with 
standard deviation of 10.53 and a standard error « 
measurement of | 28 A coefficient of 98 indicate 
that 97% of the variance of checklist scores in th 
sample was true score variance and 3% 1s error vari 
ance. With a reliability coefficient of .98, the checklis 
is capable of making individual separations among sub 
jects with a considerable degree of reliability as an 
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of 901s the nuMinuny coefhicient acceptable for this 
purpose (6) 


A formula was apphed to the rehability coetticient 
to determine the effeet upon reliability of the WPBIC 
by first doubling and then tapling ats length. By this 
formula a onc hundred stem checklist would) yield an 
rol 99 and a one hundred-tifty tem checklist would 
also yield an rol 99. Phe gain which would be realized 
by doubling or tripling the Jength of the present check- 
list would be OL The WPBIC. in its present form, 
appears to be at its optimun length with fifty items 


VALIDITY 


Four types of validity were estimated on the WPBIC 
and are discussed here fhe validity data was derived 
from the original normative sample 


Contrasted Groups Validity 


In the contrasted groups method of assessing validity 
two independent groups are defined in relation to the 
construct bemg measured and the instrument ts then 
administered to both groups Differences between the 
two groups in checklist score are then tested for statisti 
cal significance (5) Pwo independent groups were de- 
fined in relation to the construct of behavior disturb 
ance. Thirty-cight subjects in the 534 pupil sample were 
identificd as behaviorally disturbed according to one 
or more of the following criteria: |. has been examined 
by a psychologist and referred to a psychiatric oF 


clintcal facility, 2. specific educational provisions have 


been made for the subject within the school setting 


because of his behavior problem(s). 3. has. received 
mstruction at heme because of his imability to profit 
from classroony instruction due to his behavior prob 
lems). These thirty-eight subjects. so identified, were 
matched with thirty-eight subjects from the normative 
sample. not so identified in terms of age, grade, and 
sex All pupils who matched the experimental subjects 
moaye grade, and sex were lifted from the sample. A 
table of random numbers was used to facilitate the 
rundom selection of thirty-eight control subjects to be 
pared with the experimental subjects for purposes of 
expermental analysis 


Phe difference between the means of the exper 
mental and control subjects was significant beyond 


TABLE 4 


Means and Standard Deviations of 
Experimental and Control Groups 
for Statistical Significance 


Experimental | Control | 

(N 38) (N38) | D CR 
| | 

x L664) 1.6.47, 10.16 | 423 
SD 1268, |. 5.47. | | 


| 
Significant beyond .OO1 level 


the OO} level of confidence (See Table 4). Contrastes 
groups validity can be reasonably claimed tor th 
WPBIC since behaviorally disturbed subjects receives 
significantly higher scores on the construct which th 
checklist measures than did nonbehaviorally disturber 
subjects 


Criterion Validity 


A biserial correlation was computed on the norma 
tive data to assess the degree of relationship whic 
exists between scores on the WPBIC and the construc 
of behavior disturbance as measured by the three cri 
teria discussed above. If the checklist measures dis 
turbed behavior, then it appears reasonable to expec 
that scores of subjects who have been referred t 
psychiatric or clinical facilities or those who requir 
special educational provisions because of such be 
havior problems should correlate higher with the cri 
teria of behavior disturbance than scores of subject 
who are judged not in need of such attention. 

Phe biserial correlation between checklist: score 
and the criterion yielded on n, of 68. The standar 
error of this correlation is .039 and its index of pre 
dictive efficiency is 33. The n,, of .68 is significant! 
different from zero at the .O1 level. The predictive effi 
ciency index of 33 provides a measure of the check 
lists predictive value and indicates that the WPBIC ha 
utility in the prediction of behavior disturbance | 


populations of elementary school children 


Factorial Validity 


Data obtained from administering the WPBIC to 
534 pupil normative sample were factor analyzed ac 
cording to a diagonalization method originated b 
Jacobi and adapted by von Neumann for large com 
puters, Ralston and Wilf (11). The factors were the 
subjected to a Varimax Orthogonal rotation to obtai 
a simple structure. Chis procedure yielded five factor 
which are presented in Table S along with their cor 
stituent items and factor loadings 


The results of this analysis are similar to the factor 
obtained by Quay, Morse, and Cutler (10) on a samp! 
of emotionally disturbed children in special classes an 
by Patterson (8) on a sample of children referred to 
child guidance clinic This type of analysis ts useful 1 
establishing the validity of an instrument since it pri 
vides specific information about the content of a seal 
(what the scale measures) and also provides for a mor 
detailed description of behavior through factorial, pre 
file analysis techniques 


The relationships which exist between the ite! 
clusters that make up the five factors of the WPBIC¢ 


are presented in Table 6 
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TABLE 5 


Factors, Items, and Factor Loadings 
for a Sample of 534 Public School Pupils 


Factor | Acting Out (Disruptive, Aggressive, Defiant), 14 Items 
Item Number ] 4 12 Wey MNS 2 PM XO ehh 32 OO Ome oO 
Factor Loading SS) (AE 5Omee>> Svea 49 63) F689 (60) 669) 645 9.39) 47477 


factor. 
Item Number 15 29M 7 Aa 
Factor Loadiig 54 By OF LO 


45 
79 


Withdrawal (Restricted Functioning, Avoidance Behavior), 5 Items 


Factor 3 Distractability (Short Attention Span, Inadequate Study Skills, Non-Attendance), 11 Items 


10 
49 


Item Number 3 6 9 
49 30. =~36«81 


Factor loading 


13 
69 


50 
46 


4) 
67 


49 
79 


19 
4) 


24 
35 


14 
40 


Lacior 4 Dr.turbed Peer Relations (Inadequate Social Skills, Negative Self-Image, Compulsive), 10 Items 


Item) Number 5 if BSS PN AS BY NG ee ae} 
factor loading 50) 55 a6 HES fee dels BS! ie Ney SS) 
Factor 6 Imimaturity (Dependent), 10 Items 
Item Number 2 8 1] ih MAG ERE) Is} 36 44 47 
Factor loading 3256059) 69 67 79 E7s) 74> 235, 382 
TABLE 6 
Inter-Correlations of the Five WPBIC Factors 
1. 2. 3. 4. 5. 
| Disturbed Peer 
Acting-Out Withdrawal Distractablity Relations immaturity 

} Acting Out 02 | 67 48 39 
2 Withdrawal 0? =: 12 18 .23 
4 Nistractability | 67 12 | - 48 44 
4 Disturbed Peer Relations 48 18 48 aa 34 
5 Immaturity 39 | 23 44 34 — 


Phe corrclations indicate that with the exception of 
item clusters one and three. there is very little overlap 


among the five factors Phe factors seem to be rela 


tively independent of one another. This suggests that 
the WPBIC 


behavior domain (ep 


measures separate functions of the same 
behavior disturbance) 


Phe r of 67 between acting out syndrome and dis 
tractabthty mdicates that 44 per cent of either factor 
Sattributable to overlap or.common factor variance 
The content of the fems in cach factor supports the 
that the represent 
elements. In addition, acting out or hyperactive children 


assumption two factors common 


Often manitiost very high rates of non-attending and 
distractive belnivier (9. 13) 


The normative sample of 534 subjects was scored 


on the tive factors in order to obtain normed scores 


for cach factor, These data were then converted to 
| score distributions for cach of the tive factors which 


are the five scales of the checklist 





tiem Validity 


Item variance indices, item validity indices (Sec 
Fable 8), and item intercorrelations were computed o1 
all fifty items of the WPBIC. The maximum variancc 
(25) which an item can have is the point at which th 
item can make the preatest number of separation: 
among individuals. Garrett (3) recommends item vari 
ance values of .24 to 25 for most educational tes 
items since it is desirable to make maximum separa 
tions among individuals in terms of mental ability 
aptitude, and uchievement factors. However, when con 
structing an instrument which will separate a pre 
determined portion of individuals from the total sample 
the 24 to 25 value for optimal selection of items doe: 
not apply With the WPBIC, it was important to selec 
items which were not so narrow or limited in scopc 
that they were uscless for purposes of identification 
On the other hand, a behavior such as not payin, 
attention, 1s So Common and so general that it is prob 
ably typical of most school children at one time o 
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TABLE 7 


Distribution of Raw Scores, T Score, and 





Cumulative Percentages 


Scale 1 Scale 2 
1! 
i Acting-Out Wither awal 
Raw ¢ Raw Cum. 
| Score % Score | % 
| 
' 26 100 
| 25 99 
| 
i 2a 99 
a os | 
ee G22 99 Sls=— = 
21 98 14 100 
| 20 | 8 | 
| 13. | 99 
iy ll Gy 
Herat 97 
|! | | 
V7 96 12. ete 208 
11 98 
16 96 
15 95 10.) ae 497; 
1} | 
ie 94 
Ve te: 94 9 | 94 
|-- 12 - 94 - 8- --|--.93- 
| 
Pes 93 
! 10 92 7 | 91 
| 9 92 | 
6 | 88 
BS | Boel 
++ 5 —— 87 
| 7 89 | | 
| 6 B87 Lae Ba 
| 5, 85 
i 3 | 81 
1 WAU. 88 
i 3 81 
2 78 
| 2? 75 
| 1 76 
i} i] 74 | 
| 0 67 
| | 
| 0 | 70 
| | 
| | 
| X28 x 1.0 
| SD 479 Sib. 19 











(N= 534) 
Scale 3 
Distractability 
Rew Cum. 

Score % 
| 
| 
5 
13,. 1-100 
2a ie e239) 
11 98 
10 | 96 
| 
9 93 
8 91 
7 87 





5 | 81 
4 76 
3 71 
2 64 
1 50 
0 42 
X 263 
S Dee s.30 








Scale 4 
Disturbed Peer 
Relations 
Raw | Cum. 
Score | % 
i) 
| 
11 100 
10 | 99 
9 | 98 

| 
8 98 
| 
7 | 97 
6 97 
5 | 95 
| 
4 94 





| 
Pia | 86 
| 
| 
] 85- 
0 80 
Koes 
SD 216 


























| Scale 5 | 
immaturity } 
\] | 
{} | 
|| Raw Cum. 
| Score % 
! 
10 100 
| 
9. | 99: | 
| | 
| Se 4) 350 em 
| eee See 
| | 
7 97 || 
i} | i 
| | 
= | S | 
| 6 97 
| 5 96 
=| —— ee 
| 4 94 i 
| | {| 
| | \ 
| 
| 
| | 
3 92 |i 
| 1 
{| 
I | 
| | 
ie 88 || 
| 
| 
| 
| | | 
1 84 
1} 
| (| 83 
| 
\| 
| | 
] 
x 165 | 
S.D, 1.74 
i 





T Score 
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another Phis bebavior’s innocuous coment and ex 
tremely high frequency would. in all Tikethood, negate 
is value in the identification process. Since approat- 
mately ten to twenty percent of school children have 
serious behavior problems, a criterion tor WPBIC item 
selection, on the basis of variance indices, was estab- 
lished at from O09 to 60 A value of .09 equals ten 
percent possessiape a behavior and a 16 ts equal to 
twenly percent possessing the behavior 


The range of iteny variance indices is from 00 to 21 
and the item standard deviations range from .12 to 93 
Seventeen of the items have variance indices which fall 
within the optimal range of 09 to 16 for the separation 
of the disturbed segment of the school population 
(approximately ten to twenty percent) from the re 
mainder of the school population. Uhe remaininp vir 
ance indices fall cither slightly below or slightly above 
this range with the exception of items 33, 36 and 47 
The WPBIC items thus closely approximate the cr 
terion of 09 (to 16 chosen for judging the variance 
indices of madivictial mens 


The intercorrelations among fifty scale items yielded 
1,225 coefficrents which ranged 1 magnitude from OO 
to 4&3 With the exception of several tems the results 
of this analysis confiray the hypotheses that the WPBIC 
Hems are measuring separate functions of the same 
behavior domam and are not excessively duplicating 
one anothers functions. Phis analysis also provides an 
empirical basis for evaluating the teacher's judgment 
of behavior problems ino children, For instance, item 
#35 reads. “Openly strikes back with angry behavior 
to teasing. of other children” and item #42 reads 
‘Doesn't protest when others burt, tease, or eriticize 
hin > These two behaviors, by definition, would appear 
to be incompatible within the same subject. Phese two 
items intercorrelated at a value of —.03. Similarly, 
item #6 reads “Pertectionisuc: Meticulous about hav- 
ing everything exactly right” and item #7 reads: “Wall 
destroy or Gike apart something he has made rather 
than show if or ask to have it displayed” These two 
behaviors appear to be logically unrelated and the cor 
relation between them should be low. Items #6 and #7 
intercorrelate at a value of OO. This result is especially 
sipnificant in view of the fact that adjacent ttems ordi 
wath, mlercorrclate highly as a function of response 
set At the other extreme, items #9 and #49 both 
measure distactive behavior and intercorrelate at a 
value of 3 With this amount of duplication, either 
Hom could perform the function of the other 


A biserial correlation between scale items and the 
toll score was computed yielding a discrimination 
index Which is a measure of internal consistency be 
tween meividual tems and test score. Phe specific pro- 
cedure involved the selection of upper and lower 
proups, mm terms of checklist score, according to Kelley's 
(4) criterta tor the valdation of test items and then 
correlating. cach item with total score which served as 


the crferion variable 


TABLE 8 


Item Variance, Standard Deviation Indices, and 
Item Validity Indices on Fifty Checklist Items 





Item Number Variance Index 
1 | 12 
2 | 05 
3 it's) 
4 .08 
5 | (Onl 
Bain +09 
if 02 
8 04 
9 21 

10 14° 
11 Ol 
12 05 
13 VY, 
14 14 
15 13 
16 | 05 
17 .02 
18 .09 
19g 11 
20 04 
21 03 
22 ol 
8} 12 
24 12 
25 02 
26 02 
Pil 04 
28 03 
29 09 
30 : .04 
31 | 03 
Bye 05 
33 .00 


36 00 
S729 ale OG 
SB useellleas ole 
39. |. 07 
ACIS ease 
41 17 
42 08 
43 04 
44 01 
45 12 
46 04 
AT am tiin jee 
48 | 03 
AD. onl 21 
SOs 10 


“Significant at O5 level 


| 














Standard 
Deviation | Validity Index 
69 | 67 
AT 19 
7/3 Pw GY. 
58 65 
lobund “ik 83 
| 
.60 09 
29° | a5 
Jeena end 
SF dum MY 
ee lh aul 
28 24 
50 49 
85 | .48 
76 | 165 
74 14 
| 
Cy is eee 
33 19 
63 59 
67 52 
45 38 
39 48 
22. 12 
33 39 
70 56 
28 40 
20 tn ited 9 
45 58 
43 | 48 
63 | 40 
43 57 
S601) 282 
50 60 
VF aie MO 
aoe 26 
7 2 be 
a2 
51 | 
73 
55 
48 
84 53 
ey |) te 
45 | 39 
bse lh we Sis) 
73 36 
44 59 
17 03 
36 15 
93 58 * 
66 32 


Significant at .O1 leve 
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Phe item validity indices on the fifty ttems vary from 
O3 to 67 
vidual items correhite highly with the criterion (total 


Phe validity indices indicate that the ind 


seore) and that the items discriminate between subjects 
inthe upper and lower twenty seven percent of the 
Phe item validities 
Suppest further that the items making up the WPBIC 


Sample am terms af checklist score 


constitute a very homogenous. related set of behaviors 
with the exception of items 33, 36 and 47 which have 
madices of FOO LO, and O23 respectively 


EDUCATIONALLY RELATED VARIABLES 


Hypotheses were constructed to determine the effect 
Which non-behavioral but educationally relevant: varta- 
bles have upon WPBIC 
Sample 


scores of subjects in the study 
Phese variables include grade of student, sex 
of student, and sex of rater 


TABLE 9 
Sex Differences in Checklist Score on all Subjects 
| Male | Female Ss 
| (N 276) | (N 258) | Difference | Critical Ratio 
xX | 1050 | 4 83 567 | 6.67% * 
Ss) | 1? 16 | 7.40 
Significant at O5 level Significant at O1 level 


In ‘bable 9. it can be seen that mate students re- 
cetved significantly higher scores on the WPBIC than 
female students This result is consistent with research 
findings which have incheated that significantly higher 
proportions Of boys than girls are identified as be 
haviontly disturbed (1) this finding also strengthens 
the applicability of the scale for use with school popu 
Vaitions an that the checklist refleets sex differences in 
behavior disturbance which are Known to exist in such 
populations 

In Liable 


Students: were 


MO) the analysis indicates that sixth grade 
rated ais sipoiticantly tess deviant than 
either fifth or fourth grade students Phere is no em- 
pintcal evidence of which the author is aware, that 
supports (his finding Phe result may be explained by 
the fact that the difference obtamed represents a type 
one crron i thot mo aetual differences exist between 
the two proups cven though the data appear to support 
the oppostte conclusion Tf this explanation were cor- 
reel, then the null hypothesis would have to be accepted 
Since the 


eruicab ratios between both fourth and sixth and fifth 


mstead of veyeeted for this mean difference 


and sexthy grrr 
Ol level 
probabt 


subjects were significant beyond: the 
this explanation is possible but highly im 
Another explanation may be that sixth grade 
stidcits are nited ay tess deviant than fourth and fifth 
yride students because of some as yet unexplained and 
A third possible 
explanation may be that the teachers who rated sixth 
grade students im this study were “caster raters than 


desc arched matinational processes 


fourth aod titth vrade teachers 


In Table 11, no statistically significant differences 
were found between male and female raters on then 
ratings of all subjects. Chis result indicates, as would 
be expected. that male raters did not rate subjects as 
significantly more or less deviant than female raters 


In Table 12. 
means of subjects rated by a rater of the same sex and 
subjects rated by a rater of the opposite sex yielded 
an Poratio which was significant beyond the .O1 level. 
However, inspection of the respective means indicates 
that male and female raters do not rate male subjects 
in a significantly different fashion, nor do male and 
female raters rate female subjects in a significantly 
different fashion. Thus, a same sex bias did not appear 
to operate in the ratings of teachers in this sample. 
The major part of the variance is accounted for by 
the fact that both male and female teachers rated male 
students as significantly more deviant than female 
students. 


an analysis of variance applied to the 


The analysis in Fable 13 for scx differences across 
grades four, five and six yielded an F ratio which ts 
significant beyond the .O1 level. Inspection of the 
means reveals that sex differences between male and 
female subjects in terms of checklist score, held con- 
stant across the three grades. It should be noted that 
even though sixth yrade subjects were rated as signifi- 
cantly less deviant than fourth and fifth grade subjects, 
sex differences between male and female subjects in 
grade six were statistically significant. 


ADMINISTRATION 


After a minimum of two months of observing the 
subjects behavior, the rating on the checklist may be 
completed 

Phere are fifty statements describing the subjects 
behavior. Fach statement has a number ranging from 
1 to 4. in one of the five columns to the right of the 


statements 


Phe rating is accomplished by reading cach of the 
fifty statements and for cach statement which repre 
sents a condition which is present, or correctly describes 
the subject’s behavior, circle the number adjacent to 
that statement in one of the five columns to the right. 
Do NOT circle any numbers for statements which repre- 
sent a condition which is absent, or does not correctly 
Do NOT circle morc 
than one number for any one statement 


describe the subject's behavior 


SCORING 


Each of the five vertical columns to the right of the 
statements represents one of the five scales measured 
by the checklist The score for any one scale is the 
arithmetic sum of the CIRCLED numbers in- that 
seale’s vertical column For example, if the numbers 
which are circled in the first column are 4. 3. 1, 1. and 
2. their sum would be “11. This “11 would be the 
score for Seale | (Acting-Out), and would be entered 
in the box at the bottom of the first vertical column 
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TABLE 10 


Grade Differences in Checklist Score on all Subjects 














| Grade 4 Grade 5 Grade 6 F Ratio Difference | Critical Ratio 
} (N 164) (N 196) (N 174) | 
| | 
9 48 8.72 | | 76 62 
x 9 AR 504 | A444 4.23 
| a 8.7? 4 0A 3 68 $64 
SD 11 26 11.8/ 7.28 
| | | 
| | E23 ae 
Significant at O5 level Significant at .O1 level 
(Note that the score is NOW a “counting” of the quan INTERPRETATION 


tity of circled numbers ina column, but ES the addition 
of the 
cach of the five scales is computed and entered in’ the 


‘values of the circled numbers.) The score for 


same manner as deseribed for Scale | 


Ihe 
seale scores and is entered in the box labeled © 


Potal Score is computed by adding the five 
Potal 
Score” to the tight of the five boxes for the scale 


SCOLRS 


TABLE 11 


Score Differences by Sex of Rater on all Subjects 


Male Rater | Female Rater | Difference | Critical Ratio 
(MW 10) (N10) | 


Xo pe Ni 843 | 
op | 10153" |) "10989 





If the subject receives a Total Score of 21 (T Score 
of 60) or higher, he ts classified as disturbed. Tf the 
subject receives a Total Score of less than 21, he is 
not classified as disturbed. 


Interpretation of the scores on the WPBIC is facil- 
tated by using the Profile Analysis Chart (PAC) pro- 
vided on the checklist. It is helpful to complete the 
PAC on all subjects, but it should definitely be com- 
pleted on all subjects with Total Scores of 21 ot higher. 

The PAC 1s readily completed by circling the score 
for cach of the five scales and connecting them with a 
line to form the profile. Any score above the linc 
printed across the PAC at the T Score of 60 1s con- 
sidered to be high in the behavioral area defined by 
the items in that scale. Thus a child who is high in 
Scale | (Acting-Out) would require a different inter- 
vention program than one who is high in Scale 2 


(Withdrawal) 


TABLE 12 


Score Differences When Subjects Are Rated by a 
Rater of the Same Sex Versus a Rater of the Opposite Sex 


Significant at O5 level ‘Significant at O1 level 
Rating Comparisons N Xx 
MALE rates MALE 148 9 60 
FEMALE rates MALE 127 Te 7, 
MALt rates FEMALE 128 4 26 
FEMALE rates FEMALE 129 | 5.98 
MALE rates MALE 148 | 9.60 
FEMALE rates FEMALE 129 5 98 
MALE rates MAI t 148 | 9 60 
MAILE rates FEMALE 128 4 26 
FEMAIF rates FEMALE 129 5 98 
FEMALE rates MALE | 127 AN tay 
FEMALE. rates MALE 127 ie ih Lay 
MALE rates FLMALE 128) - 4 26 





| | 
| | 


"Significant at O5 level 


& 


S.D. | F Ratio Difference | Critical Ratio 
12.80 1:97 | 1.85 
11.04 | 

7.41 i721 189 
7.00 | 

| 12.80 i ra6e. ) ) seis 

| 2005 | | | 

\ \ } | 

| 1280 | 5.34 eS 

lS eee i | | 

| | | 

7.00 5.59 = 
kod ae | 

i } | 

heed .04 > 4 bavi | — 

| 7.41 | | | 

| | il GPews | 





Significant at O] level 
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TABLE 13 


Sex Differences on All Subjects by Grade Level 





| 


Female 
Xx 


6.62 


4.47 
3.62 


Grade of | Male | 
Subject I N x | $.D | N 
| 
4 f 87 HzO 2 te SOS 77: 
5 | 102 12.63 | 1403 || 94 
6 | B Om nO. 54aeme7 Ril 87 
| | | 
Significant at O5 level Significant at .01 level 
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|| F Ratio Difference H Critical 
$0. fj | | Ratio 
9.00 5.40 | 3.13% 
692 | | 8.16 | eat? 
| 5.74 | | 292 | 2.87 
| j tase | | 
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Appendix 2 
Sample of the Walker Problem Behavior 


Identification Checklist 
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Name: 
Address 
Age: 


Rated By: 


Walker Problem Behavior Identification Checklist 


by Hill M. Walker, Ph.D. 








Published by 


PUBLISHERS AND DISTRIBUTORS 
12031 WILSHIRE BOULEVARD 
LOS ANGELES, CALIFORNIA 90025 


WESTERN PSYCHOLOGICAL SERVICES 


A DIVISION OF MANSON WESTERN CORPORATION 


Date: 


School: 


Grade: 


Position of Rater: 


INSTRUCTIONS 


Classroom: 


Please read each statement carefully and respond by circling the number to the right of the statement if you have obse 
that behavioral item in the child's response pattern during the fast two month period 
described in the statement during this period, do not circle any numbers (in other words, make no marks whatsoever if the st 
ment describes behavior which is NOT present). 


Examples: 


W-97A 


1. Has temper tantrums 
2. Has no friends 
3. Refers to himself as dumb, stupid, or incapable 
4 Must have approval for tasks attempted or completed 


Statements 1 and 4 are considered to be present while statements 2 and 3 are considered to be absent. Therefore, only 
numbers to the right of items 1 and 4 are circled, and the numbers to the right of 2 and 3 are NOT circled. 


Profile Analysis Chart (PAC) 























lf you have not observed the beha 


Scales 
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. Parfectionistic: Meticulous about having everything exactly right 


. Other children act as if he were taboo or tainted 


. Approaches new tasks and situations with an ‘| can't do it’ response 


. Shuns or avoids heterosexual activities. 


. Stutters, stammers, or blocks on saying words 
. Easily distracted away from the task at hand by ordinary classroom stimuli. i ¢ minor movements 


cca | 


Complains about others untairnass and/or discrimination towards him 
Is listless and continually tired 

Does not conform to limits on his own without control from others. 

Becomes hysterical, upset o: angry when things do not go his way 
Comments that no one understands him 


Will destroy or take apart something he hes made rather than show It or ask to have it displayed 


Has difficulty concentrating for any length of time 


\s overactive, restless, and/or continually shifting body positions 
Apologizes repeatedly for himself and/or his behavior 

Distorts the truth by making statements conirary to fact 

Underachieving. Performs below his demonstrated ability level 

Disturbs other children’ teasing, provoking fights, interrupting others 

Tries to avoid calling attention to himself 

Makes distrustful or suspicious remarks about actions of others towerd him 


Reacts to stressful situations or changes in routine with general body aches, head or stomach aches 
nausea 


Argues snd must have the lest word in verbal exchanges 


Has nervous tics muscle-twitching, eye-blinking, nall-biting, hand-wringing 


Habitusily rejects the school experience through actions or comments 
Has enuresis (Wets bed ) 

Utters nonsense syllables and/or babbles to himself 

Continually seeks attention 

Comments that nobody likes fim 

Repeats one idea, thought, or activity over and over 

Has temper tantrums 

Refers to himself as dumb. stupid, or incapable 

Does not engage in group activities 


When teased or irritated by other children, takes out his frustration(s) on another inappropriate 
person or thing 


Has rapid mood shifts: depressed one moment, manic the next 

Does not obey until threatened with punishment 

Complains of nightmares, bad dreams 

Expresses concern about being lonely, unhappy 

Openly strikes back with angry behavior to teasing of other children 
Expresses concern about something terrible or horrible happening to him 
Hes no friends 

Must have approval for tasks attempted or completed 

Displays physical aggression toward objects or persons 

ts hypercritical of himself 


Does not complete tasks attempted 
Doesn't protest when others hurt, tease, or criticize him 


Steals things from other children. 

Does not initiate relationships with other children 
Reacts with defiance to instructions or commands. 
Weeps or cries without provocation 


of others, noises etc 
Frequently stares blankly into space and is unaware of his surroundings when doing so 
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